用户名: 密码: 验证码:
基于数据几何特性的概率推理和统计学习研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
概率推理与统计学习是从数据中发掘客观事物之间关联和内在联系的重要工具,是一个具有挑战性与诸多困难的研究领域。本文对概率推理和统计学习的关键技术进行了深入探讨,以几何方法描述数据的几何特性并与概率推理和统计学习方法相结合为主线和特色,研究了利用数据间几何关联性的线性和支持向量回归方法、基于检测时间序列几何结构的变结构动态贝叶斯网络自适应学习、基于几何模式相关的动态贝叶斯网络、以及基于两聚类几何模型的聚类数目估计问题。本文工作的主要贡献总结如下:
     1.针对目前线性回归和支持向量回归方法尚未关注挖掘和利用单个变量的数据关联性的问题,提出了几何关联学习方法(GcLearn)以利用这种关联性提高回归模型的预测性能。几何关联学习方法预测性能的理论分析表明,该方法具有比传统的线性回归和支持向量回归方法更好的预测性能,并给出了该方法的适用条件和判别准则。实验结果也验证了几何关联学习方法的有效性。该方法主要的创新点包括:提出挖掘单个变量的数据之间几何关联的方法、在曲线水平的几何回归方法和利用几何关联的回归模型预测方法。
     2.提出了通过检测时间序列的几何结构来自适应学习变结构动态贝叶斯网络的方法(autoDBN),较好解决了从多变量时间序列数据中寻找较准确的模型区域和学习较准确的变结构动态贝叶斯网络的问题,并且求得的一系列模型自适应于多变量时间序列之间的变化依赖关系。该方法克服了现有方法无专门机制寻找模型区域和盲目搜索的弱点,实验结果表明其性能明显优于现有方法。具体的创新点包括:设计了时间序列转换为曲线流形的方法,提出了描述和检测时间序列几何结构的方法来分割时间序列;进而设计了确定合理模型区域的寻找策略;最后,提出了基于竞争F-检验的模型回访机制修正求得的一系列模型区域和动态贝叶斯网络模型的可能错误。
     3.为了发现不同基因的表达水平在变化趋势上相关的基因调控关系,提出了基于几何模式相关的动态贝叶斯网络方法(Gp-DBN)。该方法较好地解决了基于趋势相关的基因调控关系的发现问题。真实基因表达数据的实验结果验证了该方法的有效性。该方法主要的创新点包括:提出的将基因表达的时间序列转换为几何模式的方法可以描述基因表达水平随时间上升与下降的变化趋势,用几何模式上的切向量表示几何模式特征的方法来有效地获取几何模式的离散特征量、确定调控子和估计调控时滞。
     4.针对在使用PAM聚类算法的基因表达数据聚类分析中现有估计类数方法在聚类结构比较复杂的情况(例如小聚类靠近大聚类和聚类间有轻微重叠)下效果不佳的问题,提出了基于两聚类几何模型的系统演化方法这一类数估计方法。系统演化方法较好地解决了在基因表达数据的聚类分析中当小聚类靠近大聚类和聚类间有轻微重叠情况时的类数估计问题。实验结果表明,系统演化方法在估计聚类数目的准确性上明显优于现有方法。系统演化方法通过分析所有潜在聚类中最靠近的两个聚类(孪生聚类)是否可分来完成对整个聚类结构的分析,并提出了两聚类的几何模型用于分析孪生聚类的可分性。同时,该方法将一个数据集视为伪热力学系统,提出了依据孪生聚类之间能量关系的系统演化规则确定最优聚类数目。
Probabilistic reasoning and statistical learning are important tools to explore inner relations among objects. Systematic researches have been made on some key technologies of probabilistic reasoning and statistical learning, focusing on two aspects: describing geometrical properties of data by geometrical methods, and probabilistic reasoning and statistical learning methods. The feature of the research work is the combination of the two aspects. The research items include: the linear and support vector regressions based on mining geometrical correlations between data, adaptive learning of dynamic Bayesian networks (DBN) with changing structures based on detecting geometric structures of time series, dynamic Bayesian networks based on correlations between geometrical patterns, and estimation of the number of clusters (NC) based on a two-cluster geometrical model. The main contributions of this thesis are outlined as follows:
     1. It is usually neglected that the mining and using of correlations between data of single variable in linear regression (LR) and support vector regression (SVR) methods. The geometrical correlation learning method (GcLearn) is proposed to enhance prediction ability of regression models by using this correlation information. The theoretical analysis shows that GcLearn has better prediction ability than traditional LR and SVR, and gives its applicable conditions. The experimental results show that GcLearn is effective. The proposed new methods include: the method of mining geometrical correlations between data of a variable, geometrical regression method at the level of curves, and the prediction method of using geometrical correlations.
     2. An adaptive learning method (autoDBN) is proposed to learn DBNs with changing structures from multivariate time series. autoDBN can learn a sequence of accurate model regions and DBNs with changing structures, which are adaptive to changing relations between multivariate time series. It overcomes the limitations, no special mechanism to detect model regions and blind searching, of existing methods. The experiment results show that its performance is obviously better than the existing methods. The proposed new methods include: the segmentation of time series by detecting geometric structures of time series; the finding strategies to find reasonable model regions; and the model revisiting method based on competition F-test to rectify possible errors of model regions and DBNs.
     3. A DBN method based on correlations between geometrical patterns (Gp-DBN) is proposed, and can discover gene regulatory relations based on trend correlations. The experimental results on real gene expression data show that Gp-DBN is effective. The new techniques include: the geometrical pattern of time series of a gene is proposed to describe varying trends of expression levels of this gene; the method of using tangent vectors to represent features of geometrical patterns is proposed to gain discrete features of geometrical patterns, and to estimate potential regulators and time lags.
     4. The system evolution method (SE) based on a two-cluster geometrical model is proposed to estimate NC for PAM clustering algorithm. SE can estimate NC accurately under the difficult cases that there are small clusters near larger clusters and/or slightly overlapping between clusters. The experiment results show that it outperforms the existing methods on NC estimation. SE studies a cluster structure by examining separability of two closest clusters among all the potential clusters (twin-clusters), and a two-cluster geometrical model is proposed to analyze the separability of twin-clusters. Furthermore, it regards a dataset as a pseudo-thermodynamics system, and evolution rules based on energy relations of twin-clusters are proposed to estimate the optimal NC.
引文
1 Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implication. San Francisco, CA: Morgan Kaufman, 2000.
    2 Hand H, Mannila H, Smyth P. Principle of Data Mining. Cambridge, CA: MIT Press, 2001:1-2.
    3 蔡自兴, 徐光佑. 人工智能及其应用. 北京: 清华大学出版社, 2003.
    4 Mitchell M T. Machine Learning. McGraw-Hill Companies, Inc., 1997. 机器学习. 曾华军, 张银奎译. 北京: 机械工业出版社, 2003.
    5 Hastie T. 统计学习基础-数据挖掘、推理与预测. 范明, 柴玉梅, 咎红英译, 北京: 电子工业出版社, 2004
    6 陈凯, 朱钰. 机器学习及其相关算法综述. 统计与信息论坛, 2007, 22(5): 105-112.
    7 张学工. 关于统计学习理论与支持向量机. 自动化学报, 2000, 26(1):32-42.
    8 赵广社, 张希仁. 数据挖掘中的统计方法概述. 计算机测量与控制, 2003, 11(12):914-917.
    9 王天树, 郑南宁, 袁泽剑. 机器智能与模式识别研究中的统计学习方法. 自动化学报, 2002, 28(12):103-116 (增刊).
    10 Vapnik N V. The Nature of Statistical Learning. 张学工译. 统计学习的本质. 北京: 清华大学出版社, 2000.
    11 林士敏, 王双成, 陆玉昌. Bayesian 方法的计算学习机制和问题求解. 清华大学学报(自然科学版), 2000, 40(9):61-64.
    12 Johnson-Laird P N. Mental models and probabilistic thinking. Cognition, 1994,
    
    50(1-3):189-209.
    13 Pearl J. Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kauffman, 1988.
    14 王珏, 石纯一. 机器学习研究. 广西师范大学学报(自然科学版), 2003,
    
    21(2):1-15
    15 廖海波, 万中英, 王明文. 基于投影寻踪回归文本自动分类的模型. 清华大学学报(自然科学版), 2005, 45(S1): 1823-1827.
    16 孙德山. 支持向量机分类与回归方法研究. 博士论文, 中南大学, 2004.
    17 王国丽, 陈晓飞, 刘刊, 姜国勇. 回归分析在水科学中的应用综述. 中国农村水利水电, 2004, 11:40-44.
    18 金明仲, 陈希孺. 线性回归估计相合性问题的新进展. 数学进展, 1996,25(5):389-399.
    19 田英杰. 支持向量回归机及其应用研究. 博士论文, 中国农业大学, 2005.
    20 Smola A J, Sch?lkopf B. A tutorial on support vector regression. Stat. Comput. 2004, 14:199-222.
    21 http://www.support-vector.net
    22 王玲. 基于支持向量回归的多策略建模预测方法的研究. 博士论文, 北京科技大学, 2006.
    23 Ivanciuc O. Applications of Support Vector Machines in Chemistry, Rev. Comput. Chem. 2007, 23, 291-400.
    24 Chang M W, Lin C J. Leave-one-out Bounds for Support Vector Regression Model Selection. Neural Computation, 2005, 17, 1188-1222.
    25 Pontil M, Rifkin R, Evgeniou T. From regression to classification in suppor vector machines. In Proceedings of ESANN, Brussels, D Facto, 1999, 225-230.
    26 Smola A J, Scholkopf B. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 1998, 22, 211-231.
    27 Tipping M E. The relevance vector machine. In: Solla S A, Leen T K, and Muller K R (Eds.), Advances in Neural Information Processing Systems 12. Cambridge, MA: MIT Press, 2000, 652-658.
    28 Scholkopf B, Smola A J, Williamson R C, et al. New Support Vector Algorithms. Neural Computation, 2000, 12, 1207-1245.
    29 Smola A J, Murata N, Scholkopf B, et al. Asymptotically optimal choice of v-loss for support vector machines. In Niklasson L, Boden Ivt, and Zienrke T, editors, Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing, Springer Verlag, Berlin, 1998, 105-110.
    30 Scholkopf B, Bartlett P L, Smola A J, Williamson R C. Shrinking the tube: a new support vector regression algorithms, In Kearns M S, Smola S A, and Cohn D A, editors, Advances in Nerual Information Processing Systems. Cambridge, MA, MIT Press, 1999, 11, 330-333.
    31 Frohlich H, Wegner J K, Zell A. Towards optimal descriptor subset selection with support vector machines in classification and regression. QSAR Comb. Sci., 2004, 23, 311-318.
    32 Mangasarian O L, Musicant D R. Robust linear and support vector regression. IEEE Trans. Pattern Analysis Mach. Intell., 2000, 22, 950-955.
    33 Gao J B, Gunn S R, Harris C J. Mean field method for the support vector machineregression, Neurocomputing 2003, 50, 391-405.
    34 刘靖旭. 支持向量回归的模型选择及应用研究. 博士论文, 国防科学技术大学, 2006.
    35 张浩然,汪晓东.回归最小二乘支持向量机的增量和在线式学习算法.计算机学报, 2006, 29(3): 400-406.
    36 王定成,姜斌.在线稀疏最小二乘支持向量机回归的研究.控制与决策, 2007, 22(2): 132-137.
    37 刘德地,陈晓宏.基于偏最小二乘回归与支持向量机耦合的咸潮预报模型.中山大学学报(自然科学版), 2007, 4: 89-92.
    38 徐晓燕, 王昱, 张斌. 一种集成 logistic 回归与支持向量机的判别分析规则. 系统工程理论与实践, 2007, 27(4): 41-46.
    39 Cortes C, Vapnik V. Support Vector Networks. Machine Learning, 1995, 20(3): 273-297.
    40 Collobert R, Bengio S. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research, 2001, 2, 143-160.
    41 Platt J. Fast training of support vector machines using sequential minimal optimization. In Scholkopf B Burges C J C、and Smola A J, editors, Advances in Kernel Methods- Support Vector Learning, MIT Press, 1999, 185-208.
    42 刘靖旭, 蔡怀平, 谭跃进. 支持向量回归参数调整的一种启发式算法. 系统仿真学报, 2007, 19(7):1540-1543.
    43 杜京义, 侯媛彬. 基于遗传算法的支持向量回归机参数选取. 系统工程与电子技术, 2006, 28(9):1430-1433.
    44 Ivanciuc O. QSAR for Phenols Toxicity to Tetrahymena pyriformis with Support Vector Regression and Artificial Neural Networks.Internet Electron. J. Mol. Des., 2005, 4, 928-947.
    45 Chauchard F, Cogdill R, Roussel S, Roger J M, Bellon-Maurel V. Application of LS-SVM to Non-linear Phenomena in NIR Spectroscopy: Development of a Robust and Portable Sensor for Acidity Prediction in Grapes. Chemometrics Intell. Lab. Syst., 2004, 71, 141–150.
    46 孙德山, 吴今培, 侯振挺. 基于 SVR 的混沌时间序列预测. 计算机工程与应用. 2004, 40(2):54-56.
    47 姚智胜, 邵春福, 高永亮. 基于支持向量回归机的交通状态短时预测方法研究. 北京交通大学学报. 2006, 30(3):19-22.
    48 李栋, 王洪礼, 杜忠晓, 王长江, 陈炳林. 城市生活用水量的支持向量回归预测. 天津大学学报(社会科学版), 2006, 8(1):64-67.
    49 许文龙, 李骜, 王明会, 江朝晖, 冯焕清. 基于支持向量回归方法的蛋白残基可溶性预测. 中国生物医学工程学报. 2007, 26(1):1-5.
    50 Ivanciuc O. Bioconcentration Factor QSAR with Support Vector Regression and Artificial Neural Networks. Internet Electron. J. Mol. Des., 2005, 4, 813-834.
    51 Muller K R, Smola A J, R.atsch G, et al. Predicting time series with support vector machines. ICANN'97, Berlin, 1997, SLNC 1327:999-1004.
    52 李洪双, 吕震宙. 支持向量回归机在结构可靠性分析中的应用. 航空学报, 2007, 28(1):94-99.
    53 李益国, 沈炯基. 于 v-支持向量回归的 T-S 模糊模型辨识. 中国电机工程学报. 2006, 26(18):148-153.
    54 Neapolitan R E. Learning Bayesian Networks. Englewood Cliffs, NJ: Prentice Hall, 2003.
    55 Dean T, Kanazawa K. A model for reasoning about persistence and causation. Computational Intelligence, 1989, 5:142-150.
    56 Murphy K P. Dynamic Bayesian networks: representation, inference and learning. PhD thesis, University of California Berkeley, 2002.
    57 Russell S, Norvig P. Artificial Intelligence: A Modern Approach. Second edition, Pearson Education, Inc. 2003.
    58 Pena J M, Bj?orkegren J, Tegner J. Learning Dynamic Bayesian Network Models Via Cross-Validation. Pattern Recognition Letters, 2005, 26 (14):2295-2308.
    59 Cooper G. Computational complexity of probabilistic inference using Bayesian networks. Artificial Intelligence, 1990, 42:393-405.
    60 Dagum P, Luby M. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artificial Intelligence, 1993, 60:141-153.
    61 Shimony S E. Finding MAPs for Belief Networks is NP-Hard. Artificial Intelligence, 1994, 68(2):399-410.
    62 Heckerman D. A tutorial on learning with Bayesian networks. MSR-TR-95–06, Microsoft Research (1996). http://research.microsoft.com/~heckerman/
    63 王双成. 混合贝叶斯网络隐藏变量学习研究. 计算机学报. 2005, 28(9): 1564- 1568.
    64 衡星辰, 覃征, 邵利平, 王羡慧, 王妮. 动态贝叶斯网络在复杂系统中建模方法的研究. 系统仿真学报, 2006, 18(4): 1002-1005.
    65 王双成, 苑森淼. 具有丢失数据的贝叶斯网络结构学习研究. 软件学报, 2004, 15(7):1030-1041.
    66 贺炜,潘泉,张洪才. 贝叶斯网络结构学习的发展与展望. 信息与控制, 2004,33(2): 185-190.
    67 王军. 连续变量的贝叶斯网络动态系统的结构学习研究. 硕士论文, 西安电子科技大学, 2002.
    68 Nodelman U, Shelton C R, Koller D. Learning Continuous Time Bayesian Networks. Proc. Nineteenth Conference on Uncertainty in Artificial Intelligence, 2003, pp. 451-458.
    69 Pfeffer A, Tai T. Asynchronous Dynamic Bayesian Networks. UAI 2005, 467-476.
    70 Brandherm B, Jameson A. An Extension of Differential Approach to DBN. Intl J of Intelligent Systems, 2004, 19(8), special issue "Uncertain Reasoning", Part 1.
    71 赵悦, 穆志纯, 潘秀琴, 李霞丽. 一种基于半监督主动学习的动态贝叶斯网络算法. 信息与控制, 2007, 36(2): 224-229.
    72 史建国, 高晓光. 离散动态贝叶斯网络的直接计算推理算法. 系统工程与电子技术, 2005, 27(9): 1626-1630.
    73 田凤占, 张宏伟, 陆玉昌, 石纯一. 多模块贝叶斯网络中推理的简化计算机研究与发展. 2003, 40(8): 1230-1237.
    74 Kumagai T, Akamatsu M. Prediction of Human Driving Behavior Using Dynamic Bayesian Networks. IEICE Trans D: Information, 2006, E89-D(2): 857-863.
    75 欧阳赟, 马建文, 戴芹. 利用动态贝叶斯网络进行多时相遥感变化检测. 电子与信息学报, 2007, 29(3): 549-552.
    76 史建国, 高晓光, 李相民. 离散模糊动态贝叶斯网络用于无人作战飞机目标识别. 西北工业大学学报, 2006, 24(1): 45-49.
    77 桑立锋. 动态贝叶斯网络及其在说话人识别中的应用. 硕士论文, 浙江大学, 2004.
    78 Monti S, Cooper G F. A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data. UAI 1998, 404-413.
    79 王飞, 刘大有, 薛万欣. 基于遗传算法的 Bayesian 网中连续变量离散化的研究. 计算机学报, 2002, 25(8): 794-800.
    80 Davidson E H, Erwin D H. Gene regulatory networks and the evolution of animal body plans. Science, 2006, 311(5762): 796-797.
    81 Yu J, Smith V A, Wang P P, Hartemink A J, Jarvis E D. Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 2004, 20(18): 3594-3603.
    82 Zou M, Conzen S D. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics, 2005, 21(1): 71-79.
    83 Friedman N. Inferring Cellular Networks Using Probabilistic Graphical Models. Science, 2004, 303(5659): 799-805.
    84 Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. J. Comp. Bio., 2000, 7: 601-620.
    85 Kim S Y, Imoto S, Miyano S. Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief Bioinform., 2003, 4: 228-235.
    86 Ledin J. Simulation Engineering, CMP Books, 2001.
    87 Tucker A, Liu X. A Bayesian Network Approach to Explaining Time Series with Changing Structure. Intelligent Data Analysis, 2004, 8(5): 469-480.
    88 Pavlovic V, Rehg J M, Cham T J, Murphy K P. A Dynamic Bayesian Network Approach to Figure Tracking using Learned Dynamic Models. ICCV 1999, 94-101.
    89 Pavlovic V, Rehg J M, Cormick J M. Learning Switching Linear Models of Human Motion. NIPS 2000, 981-987.
    90 Barber D. Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems. Journal of Machine Learning Research, 2006, 7: 2515-2540.
    91 高晓光, 史建国. 变结构离散动态贝叶斯网络及其推理算法. 系统工程学报, 2007, 22(1): 10-14.
    92 Xu R, Wunsch II D C. Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 2005, 16(3): 645-678.
    93 Jain A K, Murty M N, Flynn P J, Jain A K, Murty M N, Flynn P J. Data clustering: A review. ACM Comput. Surveys, 1999, 31(3): 264-323.
    94 Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
    95 Halkidi M, Batistakis Y, Vazirgiannis M. On Clustering Validation Techniques, Intelligent Information Systems Journal, 2001, 17(2-3): 107-145.
    96 Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 2002, 3(7): 0036.1-21.
    97 Handl J, Knowles J, Kell D B. Computational cluster validation in post-genomic data analysis. Bioinformatics, 2005, 21(15):3201-3212.
    98 郑欣, 林学訚. 学习非唯一的最佳聚类数. 清华大学学报(自然科学版), 2006, 46(7): 1282-1285.
    99 Strehl, A. Relationship-based Clustering and Cluster Ensembles for High- dimensional Data Mining, Ph.D thesis, The University of Texas at Austin, 2002.
    100 Chae, S S, DuBien J L, Warde W D. A method of predicting the number ofclusters using Rand's statistic. Computational Statistics and Data Analysis , 2006, 50(12): 3531-3546.
    101 Dimitriadou E, Dolnicar S, Weingessel A. An examination of indexes for determining the Number of Cluster in binary data sets. Psychometrika, 2002, 67(1): 137-160.
    102 Bolshakova N, Azuaje F. Estimating the number of clusters in DNA microarray data. Methods of Information in Medicine, 2006, 45(2): 153-157.
    103 Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a dataset via the Gap statistic. J of the Royal Statistical Society, Seires B, 2001, 63:411-423.
    104 Ben-Hur A, Guyon I. A stability based method for discovering structure in clustered data. Pac Symp Biocomputing 2002, 7: 6-17.
    105 Lange T, Roth V, Braun M, Buhmann J. Stability-based validation of clustering solutions. Neural computation, 2004, 16(6): 1299-1323.
    106 Jonnalagadda S, Srinivasan R. An Information Theory Approach for Validating Clusters in Microarray Data. ISMB/ECCB 2004.
    107 诸克军, 苏顺华, 黎金玲. 模糊 C-均值中的最优聚类与最佳聚类数. 系统工程理论与实践, 2005, 25(3): 52-61.
    108 Bickel D R. Robust Cluster Analysis of Microarray Gene Expression Data with the Number of Clusters Determined Biologically. Bioinformatics, 2003, 19(7): 818-824.
    109 Mali K, Mitra S. Clustering and its validation in a symbolic framework. Pattern Recognition Letters, 2003, 24(14): 2367-2376.
    110 Sinnakaudan S K, Ab-Ghani A, Ahmad M S S, Zakaria N A. Multiple linear regression model for total bed material load prediction. Journal of Hydraulic Engineering, 2006, 132(5): 521-528.
    111 Sousa S I V, Martins F G, Alvim-Ferraz M C M, Pereira M C. Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. Environmental Modelling & Software, 2007, 22(1): 97-103.
    112 Wagner M, Adamczak R, Porollo A, Meller J. Linear regression models for solvent accessibility prediction in proteins. Journal of Computational Biology, 2005, 12(3): 355-369.
    113 Chang C C, Lin C J. Training nu-support vector regression: theory and algorithms. Neural Computation, 2002, 14: 1959-1977.
    114 Chen B J, Chang M W, Lin C J. Load forecasting using support vector machines: a study on EUNITE Competition. IEEE Transactions on Power Systems, 2004, 19:1821-1830.
    115 Colliez J, Dufrenois F, Hamad D. Optic flow estimation by support vector regression. Engineering Applications of Artificial Intelligence, 2006, 19(7): 761- 768.
    116 Meyer D, Leisch F, Hornik K. The support vector machine under test. Neurocomputing, 2003, 55: 169-186.
    117 Yu P S, Chen, S T, Chang I F. Support vector regression for real-time flood stage forecasting. Journal of Hydrology, 2006, 328(3-4): 704-716.
    118 Zhou W, Zhang L, Jiao L, Jin P. Support vector regression based on unconstrained convex quadratic programming. ICNC 2006, 1, 167-174.
    119 莫国端, 刘开第. 函数逼近方法论. 北京: 科学出版社, 2003.
    120 Wolfram. http://mathworld.wolfram.com/LeastSquaresFittingPolynomial.html.
    121 陈维恒. 微分流形初步. 北京: 高等教育出版社, 2001.
    122 Chang C C, Lin C J. LIBSVM (version 2.82, 2006): a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    123 Blake C L, Merz C J. 1998. UCI repository of machine learning databases. University of California, Irvine. http://mlearn.ics.uci.edu/MLRepository.html
    124 Meyer D, Leisch F, Hornik K. The support vector machine under test. Neurocomputing, 2003, 55: 169-186.
    125 易东, 李辉智, 杨梦苏. 基因调控网络研究与数学模型的建立. 中国现代医学杂志, 2003, 13(24): 74-78.
    126 雷耀山, 史定华, 王翼飞. 基因调控网络的生物信息学研究. 自然杂志, 2004, 26(1): 7-12.
    127 Yu H, Luscombe N M, Qian J, Gerstein M. Genomic analysis of gene expression relationships in transcriptional regulatory networks. Trends Genet., 2003, 19: 422-427.
    128 梅向明, 黄敬之. 微分几何. 北京: 高等教育出版社, 2003.
    129 Simon I, Barnett J, Hannett N, Harbison C T, Rinaldi N J,Volkert T L, Wyrick J J, Zeitlinger J, Gifford D K, Jaakkola T S, Young R A. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell, 2001, 106: 697-708.
    130 Covert M W, Knight E M, Reed J L, Herrgard M J, Palsson B O. Integrating high- throughput and computational data elucidates bacterial networks. Nature, 2004, 429(6987): 92-96.
    131 龚光鲁. 概率论与数理统计. 北京: 清华大学出版社, 2006.
    132 Walpole R E, Myers R H, Myers S L, Ye K. Probbility and Statistics for Engineersand Scientists. Seventh Edition. Pearson Education, Inc., 2002.
    133 Murphy K P. Bayes Net Toolbox for MATLAB (2006). http://bnt.sourceforge.net/; http://www.cs.ubc.ca/~murphyk
    134 Leray P, Francois O. BNT Structure Learning Package: Documentation and Experiments (2004). http://bnt.insa-rouen.fr/ajouts.html
    135 Maharaj E A. Pattern recognition of time series using wavelets. Compstat 2002: 15th Computational Statistics Conf. of the Intl Association of Statistical Computing, Berlin, Aug. 2002.
    136 StatLib repository: http://lib.stat.cmu.edu/
    137 Krane D E, Raymer M L. 生物信息学概论. 孙啸, 陆祖宏, 谢建明译, 北京: 清华大学出版社, 2004.
    138 Bryan J. Problems in gene clustering based on gene expression data. Journal of Multivariate Analysis, 2004, 90(1): 44-66.
    139 Pollard K S, Van der Laan M J. New methods for identifying significant clusters in gene expression data. Proceedings of the American Statistical Association, Biometrics Section, 2002.
    140 Chen G, Jaradat S A, Banerjee N, Tanaka T S, Ko M S H, Zhang M Q. Evaluation and Comparison of Clustering Algorithms in Anglyzing ES Cell Gene Expression Data. Statistica Sinica, 2002, 12: 241-262.
    141 Shamir R, Maron-Katz A, Tanay A, Linhart C, Steinfeld I, Sharan R, Shiloh Y, Elkon R. EXPANDER - an integrative program suite for microarray data analysis. BMC Bioinformatics, 2005, 21(6): 232.
    142 Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng G C. Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 2006, 22(19): 2405-12.
    143 Domingues F S, Rahnenführer J, Lengauer T. Automated clustering of ensembles of alternative models in protein structure databases. Protein Eng Des Sel, 2004, 17: 537-543.
    144 Gordon G J, Rockwell G N, Godfrey P A, Jensen R V, Glickman J N, Yeap B Y, Richards W G, Sugarbaker D J, Bueno R. Validation of Genomics-Based Prognostic Tests in Malignant Pleural Mesothelioma. Clinical Cancer Research, 2005, 11: 4406-4414.
    145 Yin E, Nelson D O, Coleman M A, Peterson L E, Wyrobek A J. Gene expression changes in mouse brain after exposure to low-dose ionizing radiation. International Journal of Radiation Biology, 2003, 79(10): 759-775.
    146 Verboven S, Hubert M. LIBRA: a MATLAB Library for Robust Analysis. Chemometrics and Intelligent Laboratory Systems, 2005, 75: 127-136. http://wis.kuleuven.be/stat/robust/LIBRA.html
    147 http://www.insightful.com/; http://www.splus.com/
    148 http://cran.r-project.org/
    149 Rao S, Rodriguez A, Benson G. Evaluating distance functions for clustering tandem repeats. Genome Inform., 2005, 16: 3-12.
    150 汪志诚. 热力学与统计物理. 第 3 版. 北京: 高等教育出版社, 2003.
    151 Greiner W, Neisa L, Stocker H. Thermodynamics and Statistical Mechanics. Springer-Verlay NY Inc., 1995.
    152 Medvedovic M., Yeung K. Y., Bumgarner R. E. Bayesian mixture model based clustering of replicated microarray data. Bioinformatics, 2004, 20: 1222-1232.
    153 Armstrong S A, Staunton J E, Silverman L B, Pieters R, Boer M L, Minden M D, Sallan S E, Lander E S, Golub T R, Korsemeyer S J. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 2002, 30: 41- 47.
    154 Alizadeh A A, Eisen M B, Davis R E, Ma C, Lossos I S, Rosenwald A, Boldrick J C, Sabet H, Tran T, Yu X, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 2000, 403, 503-511.
    155 Hartuv E, Schmitt A, Lange J, Meier-Ewert S, Lehrach H, Shamir R. An algorithm for clustering cDNAs for gene expression analysis. Genomics, 2000, 66(3): 249-256.
    156 Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D, Levine A J. Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 1999, 96(12): 6745-6750.
    157 Dembélé D, Kastner P. Fuzzy C-means method for clustering microarray data, Bioinformatics, 2003, 19(8): 973-980.
    158 Jornsten R, Yu B. Simultaneous gene clustering and subset selection for sample classification via MDL. Bioinformatics, 2003, 19(9): 1100-1109.
    159 Brian T. Luke. Should the Pearson's correlation coefficient be used to determine the similarity or distance between samples? http://members.aol.com/btluke/ pearson.html

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700