用户名: 密码: 验证码:
蛋白质结构类与功能预测及物种亲缘分析问题的非线性方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着生物技术的不断进步与生物信息学研究的不断深入,生物学数据每年在以指数级增长。仅仅靠既昂贵又耗时的生化实验来分析这海量级数据及其相关的生物学问题,已变得不太现实。为适应这种需求,研发可靠高效的计算方法和算法已迫在眉睫。本文主要以非线性科学方法作为模型,研究了蛋白质结构类和功能预测及物种亲缘分析中的一些问题,具体工作如下:
     第二章我们将研究低同源蛋白质的结构类预测问题。基于被预测的蛋白质二级结构信息,我们提出了一种新的简单的核函数方法来预测蛋白质的结构类。蛋白质二级结构信息是由流行的蛋白质二级结构预测工具PSIPRED预测得到。然后基于二级结构元比对打分构造了一个线性核函数,并作为预置核函数来训练支持向量机分类器。我们的方法没有可变参数要训练。最后我们的方法被应用到两个公开的低同源训练集上,并取得了良好的分类效果。与现有方法相比,我们的方法不仅提高了总的预测精度,而且在分辨α+β类和α/β类上呈现出更高的精度。这也说明基于二级结构元比对打分的线性核函数比基于蛋白质二级结构的统计信息更能捕获蛋白质二级结构序列之间的相似性。
     第三章我们将研究蛋白质的亚细胞位置定位问题。蛋白质的亚细胞位置和其生物功能是紧密相关的。氨基酸组分是蛋白质亚细胞位置定位的一个重要模型,但是其忽略了蛋白质序列顺序信息。为了弥补氨基酸组分模型的不足,我们使用了递归定量分析和Hilbert-Huang变换。这两个方法分别可以提取时间序列中的递归模式和不同频率信息。为了使用这两种方法,我们使用氨基酸的疏水性自由能和可溶性特性将每条氨基酸序列转化为两条时间序列。综合氨基酸组分、递归定量分析和Hilbert-Huang变换这三个模型总共产生62个特征。最终,每条蛋白质序列由62维特征向量表示。我们使用最大相关最小冗余方法来排列这62个特征,并仍旧使用SVM作为分类模型。使用刀切检验选择最优特征子集和评估这个方法的性能。我们方法测试了三个凋亡蛋白数据集,并从最终的结果中可得出,我们的方法使用相对较少的特征达到了较好的预测精度。这说明我们的方法对已有方法可能起到弥补作用。
     第四章我们将研究蛋白质亚细胞核位置定位问题。比起蛋白质的亚细胞位置定位,蛋白质亚细胞核位置定位更具挑战性。我们设计了一个新的两阶段多类支持向量机(two-stage multiclass support vector machine),并成功地将它应用到蛋白质亚细胞核预测。我们综合使用了两类特征提取方法:基于氨基酸分类的方法和基于氨基酸物理化学性质的方法。为了减少计算复杂度和特征冗余,我们提出了一个“两步最优特征选择方法”(two-step optimal feature selection)来寻找最优特征子集。在我们设计的系统中,所有的分类子是用带有概率输出的支持向量机构造的。我们使用径向基核函数,它的参数是由一个自动优化方法来确定,这进一步加速了我们的方法。一个权重策略是被用来处理不平衡数据集的问题。最后,我们方法和已有方法在三个测试集上的比较结果表明我们的方法是更加有效的,而且我们方法的结果优于单独使用支持向量机分类子和随机森林等分类子的结果。
     第五章我们将研究脊椎动物的亲缘关系分析。我们选取线粒体基因组作为我们的数据。我们首先利用DNA序列的混沌游戏表示(chaos game representation,CGR)来表示线粒体基因组。然后我们使用两种马尔科夫链(Markov chain)模型来模拟线粒体基因组,并将其作为基因组序列的噪声背景(noise background)候选模型。然后,我们基于这两个模型构造无比对方法,并应用在分析64个脊椎动物的亲缘关系分析中。最后,我们发现,在模拟线粒体基因组的CGRs方面,二阶马尔科夫链模型比一阶马尔科夫链模型更精细;但是,一阶马尔科夫链模型的CGR更适合用来表示随机背景,从原始CGRs中去除这个随机背景能增强线粒体基因组中的进化信息。
With the development of biotechnology and bioinformatics, biological data haveincreased in exponential way every year. It is not really practical to analyze such massdata alone by performing expensive and time-consuming biochemical experiments. Tomeet such requirement, it is extremely urgent to develop reliable and effective compu-tational methods and algorithms. This thesis study the prediction of protein structuralclasses and functions and phylogenetic analysis based on nonlinear science methods.The detailed work are summarized as follows:
     In Chapter2, we study about predicting the structural classes of low-homologyproteins. Based on predicted secondary structures, we propose a new and simple k-ernel method to predict protein structural classes. The secondary structures of al-l amino acids sequences are obtained by using the tool PSIPRED and then a linearkernel on the basis of secondary structure element alignment scores is constructed andthen is considered to be a precomputed kernel function for training a support vectormachine classifier without parameter adjusting. The overall accuracies on two publiclow-homolgoy datasets are higher than those obtained by other existing methods. Es-pecially, our method achieves higher accuracies for differentiating the α+β class andthe α/β class compared to other methods. It is concluded that the linear kernel on thebasis of secondary structure element alignment scores better captures the similarity be-tween two secondary structural element sequences than existing statistical informationextracted from predicted secondary structures.
     In Chapter3, we study the problem of subcellular localizations of proteins. Thefunction of a protein is closely related with its subcellular location. Amino acid com-position is one of important models for subcellular localizations of proteins, but itignores sequence-order information. In order to make up for this deficiency, we addtwo methods, recurrence quantification analysis and Hilbert-Huang transform. Thesetwo methods can extract recurrence patterns and frequency information in time series.In order to make use of two models, we convert each amino acids sequence into twotime series by using hydrophobic free energies and solvent accessibilities of20aminoacids. The ensemble model of amino acid composition, recurrence quantification anal-ysis and Hilbert-Huang transform generate62features. As a result, each amino acidssequence is represented by a62-dimensional feature vector. All features are ranked bythe maximum relevance and minimum redundancy method and support vector machineis still used as classifier. The jackknife test is used to select optimal feature subset, e- valuate and compare our method with other existing methods. Our method is testedon three apoptosis protein datasets. It can be seen from final results that our methodachieves the best performances by using relatively few features. This suggests that ourmethod may complement the existing methods.
     In Chapter4, we study subnuclear localizations of proteins. Compared with sub-cellular localizations of proteins,subnuclear localizations of proteins are more chal-lenging. A novel two-stage multiclass support vector machine is proposed and is suc-cessfully applied to predict subnuclear localizations of proteins. It only considers thosefeature extraction methods based on amino acid classifications and physicochemicalproperties. In order to reduce computation complexity and feature abundance, we pro-pose a two-step optimal feature selection process to find the optimal feature subset. Inour system, all classifiers are constructed using support vector machine with probabil-ity output. We use the radial basis kernel function, whose parameter is determined byan automatic optimization method to speed up our system. The weight strategy is usedto handle the unbalanced dataset. From the results on three datasets, our ensemblemethod is valuable and effective for predicting protein subnuclear locations comparedwith existing methods for the same problem and is better than popular machine learn-ing classifiers (such as support vector machine, random forest).
     In Chapter5, we study vertebrate phylogeny based on mitochondrial genomes.The mitochondrial genomes are represented by the chaos game representation (CGR),a tool for DNA sequence representation. Then, two Markov chain models are used tosimulate the CGRs of mitochondrial genomes and are considered as noise backgroundcandidate models. Alignment-free methods are constructed based on two Markovchain models, and are applied to analyze the phylogeny of64selected vertebrates. Fi-nally, we conclude from the results that the second-order Markov chain model is morepowerful than the first-order Markov chain model in simulating the CGR of the mito-chondrial genomes while the CGR simulated by the first-order Markov chain modelare more suitable for modeling the random background and can be subtracted from theoriginal CGRs to enhance the phylogenetic information in the mitochondrial genomes.
引文
[1]郝柏林,张淑誉.生物信息学基础手册[M].上海:科学技术出版社,2002.
    [2]许忠能.生物信息学[M].北京:清华大学出版社,2008.
    [3]张阳德.生物信息学[M].北京:科学出版社,2009.
    [4]孙啸,陆祖宏,谢建明.生物信息学基础[M].北京:清华大学出版社,2005.
    [5] D.R.韦斯特海德, J.H.帕里什, R.M.特怀曼著;王明怡,杨益,吴平等译.生物信息学[M].北京:科学出版社,2005.
    [6] D.E. Krane, M.L. Raymer著;孙啸,陆祖宏,谢建明等译.生物信息学概论[M].北京:清华大学出版社,2004.
    [7] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, et al. GenBank[J]. Nucleic Acids Res.,2003,31(1):23-27.
    [8] G. Stoesser, W. Baker, A. van den Broek, et al. The EMBL Nucleotide SequenceDatabase[J]. Nucleic Acids Res.,2001,29(1):17-21.
    [9] Y. Tateno, S. Miyazaki, M. Ota, et al. DNA data bank of Japan (DDBJ) in collabrationwith mass sequeneing teams[J]. Nucleic Acids Res.,2000,28:24-26.
    [10] A. Rzhetsky, M. Nei. A Simple Method for Estimating and Testing Minimum-EvolutionTrees[J]. Mol. Biol. Evol.,1992,9(5):945-967.
    [11] W.M. Fitch, E. Margoliash. Construction of phylogenetic trees[J]. Science,1967,155:279-284.
    [12] N. Saitou, M. Nei. The neighbor joining method: a new method for reconstruction phy-logenetic trees[J]. Mol. Biol. Evol.,1987,4:406-425.
    [13] P.H.A. Sneath, R.R. Sokal. Numerical taxonomy[M]. San Francisco: W.H. Freeman andCompany,1973.
    [14] J.A. Lake. A rate-independent technique for analysis of nucleic acid sequences: evolu-tionary parsimony[J]. Mol. Biol. Evol.,1987,4(2):167-191.
    [15] J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood ap-proach[J]. J. Mol. Evol.,1981,17:368-376.
    [16] W.M. Fitch. On the problem of discovering the most parsimonious tree[J]. Am. Nat.,1977,111:223-257.
    [17] M. Levitt, C. Chothia. Structural patterns in globular proteins[J]. Nature,1976,261:552-558.
    [18] A. Andreeva, D. Howorth, S.E. Brenner, T.J. Hubbard, C. Chothia, A.G. Murzin. SCOPdatabase in2004: refinements integrate structure and sequence family data[J]. NucleicAcids Res.,2004,32:D226-229.
    [19] A.G. Murzin, S.E. Brenner, T. Hubbard, C. Chothia. SCOP: a structural classification ofproteins database for the investigation of sequences and structures[J]. J. Mol. Biol.,1995,247:536-540.
    [20] P.H. Raven, G.B. Johnson. Understanding Biology[M]. William C Brown Communica-tions,1995.
    [21] C. Anfinsen. Principles that govern thefolding of protein chains[J]. Science,1973,181:223-230.
    [22] K.C. Chou. A novel approach to predicting protein structural classes in a (20-1)-D aminoacid composition space[J]. Proteins,1995,21:319-344.
    [23] I. Bahar, A.R. Atilgan, R.L. Jernigan, B. Erman. Understanding the recognition of proteinstructural classes by amino acid composition[J]. Proteins,1997,29:172-185.
    [24] K. Nishkawa, T. Ooi. Correlation of the amino acid composition of a protein to itsstructural and biological characters[J]. J. Biochem.,1982,91:1821-1824.
    [25] K.C. Chou, C.T. Zhang. Predicting of protein structural class[J]. Crit. Rev. Biochem.Mol. Biol.,1995,30:275-349.
    [26] Z.X. Wang, Z. Yuan. How good is the prediction of protein structural class by thecomponent-coupled method?[J]. Proteins,2000,38:165-175.
    [27] K.C. Chou. Prediction of protein cellular attributes using pseudo amino acid composi-tion[J]. Proteins,2001,43:246-255.
    [28] C. Chen, Y.X. Tian, X.Y. Zou, P.X. Cai, J.Y. Mo. Using pseudo amino acid compositionand support vector machine to predict protein structural class[J]. J. Theor. Biol.,2006,243:444-448.
    [29] T.L. Zhang, Y.S. Ding, K.C. Chou. Prediction protein structural classes with pseudo-amino acid composition: Approximate entropy and hydrophobicity pattern[J]. J. Theor.Biol.,2008,250:186-193.
    [30] X. Xiao, S. Shao, Z. Huang, K.C. Chou. Using pseudo amino acid composition to predictprotein structural classes: approached with complexity measure factor[J]. J. Comput.Chem.,2006,27:478-482.
    [31] T.G. Liu, X.Q. Zheng, J. Wang. Prediction of protein structural class using a complexity-based distance measure[J]. Amino Acids,2010,38:721-728.
    [32] J.B. Xia, S.L. Zhang, F. Shi, H.J. Xiong, X.H. Hu, X.H. Niu, Z. Li. Using the concept ofpseudo amino acid composition to predict resistance gene against Xanthomonas oryzaepv. oryzae in rice: An approach from chaos games representation[J]. J. Theor. Biol.,2011,284:16-23.
    [33] L.A. Kurgan, L. Homaeian. Prediction of structural classes for protein sequences anddomain-impact of prediction algorithms, sequence representation and homology, and testprocedures on accuracy[J]. Pattern Recognit.,2006,39:2323-2343.
    [34] S. Costantini, A.M. Facchiano. Prediction of the protein structural class by specific pep-tide frequencies[J]. Biochimie,2009,91:226-229.
    [35] K.D. Kedarisetti, L. Kurgan, S. Dick. Classifier ensembles for protein structural classprediction with varying homology[J]. Biochem. Biophys. Res. Commun.,2006,348:981-988.
    [36] J.Y. Yang, Z.L. Peng, Z.G. Yu, R.J. Zhang, V. Anh, D.S.Wang. Prediction of proteinstructural classes by recurrence quantification analysis based on chaos game representa-tion[J]. J. Theor. Biol.,2009,257:618-626.
    [37] K.C. Chou, Y.D. Cai. Predicting protein structural class by functional domain composi-tion[J]. Biochem. Biophys. Res. Commun.,2004,321:1007-1009.
    [38] K. Chen, L.A. Kurgan, J.S. Ruan. Prediction of protein structural class using novel evolu-tionary collocation-based sequence representation[J]. J. Comput. Chem.,2008,29:1596-1604.
    [39] T. Liu, X. Zheng, J. Wang. Prediction of protein structural class for low-similaritysequences using support vector machine and PSI-BLAST profile[J]. Biochimie,2010,92:1330-1334.
    [40] L.A. Kurgan, T. Zhang, H. Zhang, S.Y. Shen, J.S. Ruan. Secondary structure-basedassignment of the protein structural classes[J]. Amino Acids,2008,35:551-564.
    [41] L. Kurgan, K. Cios, K. Chen. SCPRED: accurate prediction of protein structural class forsequences of twilight-zone similarity with predicting sequences[J]. BMC Bioinformatics,2008,9:226.
    [42] M.J. Mizianty, L. Kurgan. Modular prediction of protein structural classes from se-quences of twilight-zone identity with predicting sequences[J]. BMC Bioinformatics,2009,10:414.
    [43] J.Y. Yang, Z. Peng, X. Chen. Prediction of protein structural classes for low-homologysequences based on predicted secondary structure[J]. BMC Bioinformatics,2010,11:S9.
    [44] S.L. Zhang, S.Y. Ding, T.M. Wang. High-accuracy prediction of protein structural classfor low-similarity sequences based on predicted secondary structure[J]. Biochimie,2011,93:710-714.
    [45] S.Y. Ding, S.L. Zhang, Y. Li, T.M. Wang. A novel protein structural class-es prediction method based on predicted secondary structure[J]. Biochimie,2012,doi:10.1016/j.bbr.2011.03.031.
    [46] T. Przytycka, R. Aurora, G.D. Rose. A protein taxonomy based on secondary structure[J].Nat. Struct. Biol.,1999,6:672-682.
    [47] L.J. Mcguffin, K. Bryson, D.T. Jones. What are the baselines for protein fold recogni-tion?[J]. Bioinformatics,2001,17:63-72.
    [48] RCSB Protein Data Bank. http://www.rcsb.org/pdb/home/home.do.
    [49] D.T. Jones. Protein secondary structure prediction based on position specific scoringmatrices[J]. J. Mol. Biol.,1999,292:195-202.
    [50] F. Birzele, S. Kramer. A new representation for protein secondary structure predictionbased on frequent patterns[J]. Bioinformatics,2006,22:2628-2634.
    [51] S.F. Altschul, T.L. Madden, A.A. Scha¨ffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman.Gapped BLAST and PSI-BLAST: a new genera-tion of protein database search program-s[J]. Nucleic Acids Res.,1997,25:3389-3402.
    [52] P. Fontana, E. Bindewald, S. Toppo, R. Velasco, G. Valle, S.C. Tosatto. The SSEA serverfor protein secondary structure alignment[J]. Bioinformatics,2005,21:393-395.
    [53] R.X. Yan, Z. Chen, Z.D. Zhang. Outer membrane proteins can be simply identified usingsecondary structure element alignment[J]. BMC Bioinformatics,2011,12:76.
    [54] J.K. Kim, G.P.S. Raghava, S.Y. Bang, S. Choi. Prediction of subcellular localizationof proteins using pairwise sequence alignment and support vector machine[J]. PatternRecog. Lett.,2006,27:996-1001.
    [55] L. Liao, W.S. Noble. Combining pairwise sequence similarity and support vector ma-chines for detecting remote protein evolutionary and structural relationships[J]. J. Com-put. Biol.,2003,10:857-868.
    [56] M.W. Mak, J. Guo, S.Y. Kung. PairProSVM: protein subcellular localization based on lo-cal pairwise profile alignment and SVM[J]. IEEE/ACM Trans. Comput. Biol. Bioinform.,2008,5:416-422.
    [57] V.N. Vapnik. The Nature of Statistical Learning Theory[M]. Springer,1995.
    [58] C.M. Bishop. Pattern Recognition and Machine Learning[M]. Springer,2006.
    [59] J.C. Platt. Fast Training of Support Vector Machines using Sequential Minimal Opti-mization[M]. In Advances in Kernel Methods-Support vector Learning. Cambridge: TheMIT Press;1999:185-208.
    [60] J.C. Platt, N. Cristianini, J. Shawe-Taylor. Large margin DAGs for multiclass classifica-tion[M]. In Advances in Neural Information Processing Systems. Volume12. Edited bySolla SA, Leen TK, Mu¨ller KR. Cambridge: The MIT Press;2000:547-553.
    [61] C.C. Chang, C.J. Lin. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/cjlin/papers/libsvm.pdf
    [62]马军伟.基于机器学习方法的蛋白质亚细胞定位预测研究[D].辽宁:大连理工大学,2011.
    [63]沈红斌.数据挖掘的建模及在生物信息学中的应用研究[D].上海:上海交通大学,2006.
    [64] M.D. Jacobson, M. Weil, M.C. Raff. Programmed cell death in animal development[J].Cell,1997,88:347-354.
    [65] J.F. Kerr, A.H. Wyllie, A.R. Currie. Apoptosis: a basic biological phenomenon withwide-ranging implications in tissue kinetics[J]. Brit. J. Cancer,1972,26:239-257.
    [66] J.M. Adams, S. Cory. The Bcl-2protein family: arbiters of cell survival[J]. Science,1998,281:1322-1326.
    [67] G. Evan, T. Littlewood. A matter of life and cell death[J]. Science,1998,281:1317-1322.
    [68] J.C. Reed, G. Paternostro. Postmitochondrial regulation of apoptosis during heart fail-ure[J]. Proc. Natl. Acad. Sci. USA,1999,96:7614-7616.
    [69] M. Raff. Cell suicide for beginners[J]. Nature,1998,396:119-122.
    [70] J.B. Schulz, M. Weller, M.A. Moskowitz. Caspases as treatment targets in stroke andneurodegenerative diseases[J]. Ann. Neurol.,1999,45:421-429.
    [71] M. Suzuki, R.J. Youle, N. Tjandra. Structure of Bax: coregulation of dimer formationand intracellular localization[J]. Cell,2000,103:645-654.
    [72] A. Bairoch, R. Apweiler. The SWISS-PROT protein sequence data bank and its supple-ment TrEMBL[J]. Nucleic Acids Res.,2000,25:31-36.
    [73] L.J. Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H.H.Staerfeldt, K. Rapacki, C. Workman, et al.. Prediction of human protein functionfrom post-translational modifications and localization features[J]. J. Mol. Biol.,2002,319:1257-1265.
    [74] Z.G. Yu, Q.J. Xiao, L. Shi, J.W. Yu, V. Anh. Chaos game representation of functionalprotein sequences, and simulation and multifractal analysis of induced measures[J]. Chin.Phys. B,2010,19(6):068701.
    [75] S.M. Zhu, Z.G. Yu, V. Anh. Protein structural classification and family identification bymultifractal analysis and wavelet spectrum[J]. Chin. Phys. B,2011,20(1):010505.
    [76] K.C. Chou, Y.D. Cai. Prediction of protein subcellular locations by GO-FunD-PseAApredictor[J]. Biochem. Biophys. Res. Commun.,2004,320:1236-1239.
    [77] A. Reinhardt, T. Hubbard. Using neural networks for prediction of the subcellular loca-tion of proteins[J]. Nucleic Acids Res.,1998,26:2230-2236.
    [78] S.J. Hua, Z.R. Sun. Support vector machine approach for protein subcellular localizationprediction[J]. Bioinformatics,2001,17:721-728.
    [79] Y. Huang, Y.D. Li. Prediction of protein subcellular locations using fuzzy k-NNmethod[J]. Bioinformatics,2001,21:21-28.
    [80] S. Briesemeister, Rahnenfu¨hrer J, O. Kohlbacher. Going from where to why-interpretableprediction of protein subcellular localization[J]. Bioinformatics,2010,26:1232-1238.
    [81] X. Xiao, S. Shao, Y. Ding, Z. Huang, K.C. Chou. Using cellular automata images andpseudo amino acid composition to predict protein subcellular location[J]. Amino Acids,2006,30:49-54.
    [82] K.C. Chou, H.B. Shen. Predicting Protein Subcellular Location by Fusing Multiple Clas-sifiers[J]. J. Cell Biochem.,2006,99:517-527.
    [83] Z. Yuan. Prediction of protein subcellular locations using Markov chain models[J]. FEBSLett.,1999,451:23-26.
    [84] Q. Xu, D.H. Hu, H. Xue, W.C. Yu, Q. Yang. Semi-supervised protein subcellular local-ization[J]. BMC Bioinformatics,2009,10(Suppl1):S47.
    [85] T.Q. Tung, D. Lee. A method to improve protein subcellular localization predictionby integrating various biological data sources[J]. BMC Bioinformatics,2009,10(Suppl1):S43.
    [86] A. Ho¨glund, P. Do¨nnes, T. Blum, H.W. Adolph, O. Kohlbacher. MultiLoc: predictionof protein subcellular localization using N-terminal targeting sequences, sequence motifsand amino acid composition[J]. Bioinformatics,2006,22:1158-1165.
    [87] K.C. Chou, H.B. Shen. A New Method for Predicting the Subcellular Localization ofEukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc2.0[J]. PLoS ONE,2010,5:e9931.
    [88] G.P. Zhou, K. Doctor. Subcellular location prediction of apoptosis proteins[J]. ProteinsStruct. Funct. Genet.,2003,50:44-48.
    [89] A. Bulashevska, R. Eils. Predicting protein subcellular locations using hierarchical en-semble of Bayesian classifiers based on Markov chains[J]. BMC Bioinformatics,2006,7:298.
    [90] Z.H. Zhang, Z.H. Wang, Z.R. Zhang, Y.X. Wang. A novel method for apoptosis proteinsubcellular localization prediction combining encoding based on grouped weight andsupport vector machine[J]. FEBS Lett.,2006,580:6169-6174.
    [91] Y.S. Ding, T.L. Zhang. Using Chou’s pseudo amino acid composition to predict subcellu-lar localization of apoptosis proteins: an approach with immune genetic algorithm-basedensemble classifier[J]. Pattern Recogn. Lett.,2008,29:1887-1892.
    [92] Y.L. Chen, Q.Z. Li. Prediction of the subcellular location of apoptos proteins[J]. J. Theor.Biol.,2007,245:775-783.
    [93] Y.L. Chen, Q.Z. Li. Prediction of apoptosis protein subcellular location using improvedhybrid approach and pseudo amino acid composition[J]. J. Theor. Biol.,2007,248:377-381.
    [94] L. Zhang, B. Liao, D.C. Li, W. Zhu. A novel representation for apoptosis protein sub-cellular localization prediction using support vector machine[J]. J. Theor. Biol.,2009,259:361-365.
    [95] Q. Gu, Y.S. Ding, X.Y. Jiang, T.L. Zhang. Prediction of subcellular location apoptosisproteins with ensemble classifier and feature selection[J]. Amino Acids,2008,38:974-983.
    [96] J.C.L. Webber, J.P. Zbilut. Dynamical assessment of physiological systems and statesusing recurrence plot strategies[J]. J. Appl. Physiol.,1994,76:965-973.
    [97] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, S.H. Shih, Q. Zheng, et al. The empiricalmode decomposition and the Hilbert spectrum for nonlinear and nonstationary time seriesanalysis[J]. Proc. R. Soc.,1998,454:903-995.
    [98] Z.G. Yu, V. Anh, K.S. Lau, L.Q. Zhou. Clustering of protein structures using hydrophobicfree energy and solvent accessibility of proteins[J]. Phys. Rev. E,2006,73:031920.
    [99] K.A. Selz, A.J. Mandell, M.F. Shlesinger. Hydrophobic free energy eigenfunctions ofpore, channel and transporter proteins contain beta-burst patterns[J]. Biophys. J.,1998,75:2332.
    [100] D. Bordo, P. Argos. Suggestions for “safe” residue substitutions in site-directed muta-genesis[J]. J. Mol. Biol.,1991,217:721-729.
    [101] H. Peng, F. Long, C. Ding. Feature selection based on mutual information: criteriaof max-dependency, max-relevance, and min-redundancy[J]. IEEE Tran. Pattern Anal.,2005,27:1226-1238.
    [102] K.C. Chou, H.B. Shen. Recent progress in protein subcellular location prediction[J].Anal. Biochem.,2007,370:1-16.
    [103] H. Nakashima, K. Nishikawa. Discrimination of intracellular and extracellular protein-s using amino acid composition and residue-pair frequencies[J]. J. Mol. Biol.,1994,238:54-61.
    [104] J. Cedano, P. Aloy, J.A. Perez-Pons, E. Querol. Relation between amino acid composi-tion and cellular location of proteins[J]. J. Mol. Biol.,1997,266:594-600.
    [105] J.P. Eckmann, S.O. Kamphorst, D. Ruelle. Recurrence plots of dynamical systems[J].Europhys. Lett.,1987,4:973-977.
    [106] M.A. Riley, G.C. Van Orden. Tutorials in contemporary nonlinear methods for the be-havioral sciences[M]. March1,2005, Retrieved from http://www.nsf.gov/sbe/bcs/pac/nmbs/nmbs.jsp.
    [107] A. Giuliani, R. Benigni, J.P. Zbilut, J.C.L. Webber, P. Sirabella, A. Colosimo. Nonlinearsignal analysis methods in the elucidation of protein sequence-structure relationships[J].Chem. Rev.,2002,102:1471-1491.
    [108] A. Giuliani, P. Sirabella, R. Benigni, A. Colosimo. Mapping protein sequence spaces byrecurrence: a case study on chimeric structures[J]. Protein Eng.,2000,13:671-678.
    [109] C. Manetti, M.A. Ceruso, A. Giuliani, J.C.L. Webber, J.P. Zbilut. Recurrence quantifi-cation analysis as a tool for the characterization of molecular dynamics simulations[J].Phys. Rev. E,1999,59:992-998.
    [110] J.C.L. Webber, A. Giuliani, J.P. Zbilut, A. Colosimo. Elucidating protein secondarystructures using alpha-carbon recurrence quantifications[J]. Proteins,2001,3:292-303.
    [111] Y. Zhou, Z.G. Yu, V. Anh. Cluster protein structures using recurrence quantifica-tion analysis on coordinates of alpha-carbon atoms of proteins[J]. Phys. Lett. A,2007,368:314-319.
    [112] Y.C. Yang, W. Tantoso, K.B. Li. Remote protein homology detection using recurrencequantification analysis and amino acid physicochemical properties[J]. J. Theor. Biol.,2008,252:145-154.
    [113] J.Y. Yang, X. Chen. Improving taxonomy-based protein fold recognition by using globaland local features[J]. Proteins,2011,79:2053-2064.
    [114] G.S. Han, Z.G. Yu, V. Anh. Predicting the subcellular location of apoptosis proteinsbased on recurrence quantification analysis and the Hilbert-Huang transform[J]. Chin.Phys. B2011,20:100504.
    [115] N. Marwan, M.C. Romano, M. Thiel, J. Kurths. Recurrence plots for the analysis ofcomplex systems[J]. Phys. Rep.,2007,438:237-329.
    [116] L. Cao. Practical method for determining the minimum embedding dimension of a scalartime series[J]. Physica D,1997,110:43-50.
    [117] G. Rilling, P. Flandrin, P. Goncalves. On empirical mode decomposition and its algo-rithms. IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing NSIP-03,Grado (I);2003.
    [118] N.E. Huang, M.C. Wu, S.R. Long, S.S.P. Shen, W. Qu, P. Gloersen, et al. A confidencelimit for the empirical mode decomposition and Hilbert spectrum analysis[J]. Proc. R.Soc.,2003,459:2317-2345.
    [119] Z.G. Yu, V. Anh, Y. Wang, D. Mao, J. Wanliss. Modelling and simulation of the horizon-tal component of the geomagnetic field by fractional stochastic differential equations inconjunction with empirical mode decomposition[J]. J. Geophys. Res.,2010,115:A10219.
    [120] F. Shi, Q.J. Chen, N.N. Li. Hilbert Huang transform for predicting proteins subcellularlocation[J]. J. Biomed. Sci. Eng.,2008,1:59-63.
    [121] T. Huang, X.H. Shi, P. Wang, Z.S. He, K.Y. Feng, L.L. Hu, X.Y. Kong, Y.X. Li, Y.D.Cai, K.C. Chou. Analysis and Prediction of the Metabolic Stability of Proteins Basedon Their Sequential Features, Subcellular Locations and Interaction Networks[J]. PLoSONE,2010,5:e10972.
    [122] X.L. Li, D. Li, Z.H. Liang, L.J. Voss, J.W. Sleigh. Analysis of depth of anesthesia withHilbert–Huang spectral entropy[J]. Clin. Neurophysiol.,2008,119(11):2465–2475.
    [123] J.Y. Shi, S.W. Zhang, Q. Pan, Y.M. Cheng, J. Xie. Prediction of protein subcellularlocalization by support vector machines using multi-scale energy and pseudo amino acidcomposition[J]. Amino Acids,2007,33:69-74.
    [124] J. Huang, F. Shi, H.B. Zhou. Support vector machine for predicting apoptosis proteinstypes by incorporating protein instability index[J]. China J. Bioinformatics,2005,3:121-123.
    [125] Z.D. Lei, Y. Dai. An SVM-based system for predicting protein subnuclear localization-s[J]. BMC Bioinformatics,2005,6:291.
    [126] S.Y. Mei, W. Fei. Amino acid classification based spectrum kernel fusion for proteinsubnuclear localization[J]. BMC Bioinformatics,2010,11(Suppl1):S17.
    [127] H.B. Shen, K.C. Chou. Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition[J]. Biochem. Biophys.Res. Commun.,2005,337:752-756.
    [128] R.D. Phair, T. Misteli. High mobility of proteins in the mammalian cell nucleus[J].Nature,2000,404:604-609.
    [129] R.F. Murphy, M.V. Boland, M. Velliste. Towards a systematics for protein subcellularlocation: quantitative description of protein localization patterns and automated analysisof fluorescence microscope images[J]. Proc. Int. Conf. Intell. Syst. Mol. Biol.,2000,8:251–259.
    [130] O. Emanuelsson, H. Nielsen,S. Brunak, G. von Heijne. Predicting subcellular localiza-tion of proteins based on their N-terminal amino acid sequence[J]. J. Mol. Biol.,1997,300:1005-1016.
    [131] O. Emanuelsson, S. Brunak, G. von Heijne, H. Nielsen. Locating proteins in the cellusing TargetP, SignalP, and related tools[J]. Nat. Protoc.,2007,2:953-971.
    [132] W.L. Huang, C.W. Tung, H.L. Huang, S.F. Hwang, S.Y. Ho. ProLoc: Prediction of pro-tein subnuclear localization using SVM with automatic selection from physicochemicalcomposition features[J]. BioSystems,2007,90:573-581.
    [133] A. Ho¨glund, P. Do¨nnes, T. Blum, H.W. Adolph, O. Kohlbacher. MultiLoc: predictionof protein subcellular localization using N-terminal targeting sequences, sequence motifsand amino acid composition[J]. Bioinformatics,2006,22:1158-1165.
    [134] A. Pierleoni, P.L. Martelli, P. Fariselli, Casadio R. BaCelLo: a balanced subcellularlocalization predictor[J]. Bioinformatics,2006,22:e408-416.
    [135] D. Sarda, G.H. Chua, K.B. Li, A. Krishnan. pSLIP: SVM based protein subcellular lo-calization prediction using multiple physicochemical properties[J]. BMC Bioinformatics,2005,6:152.
    [136] J.R. Wang, W.K. Sung, A. Krishnan, K.B. Li. Protein subcellular localization predictionfor Gram-negative bacteria using amino acid subalphabets and a combination of multiplesupport vector machines[J]. BMC Bioinformatics,2005,6:174.
    [137] N.Y. Yu, J.R. Wagner, M.R. Laird, G. Melli, S. Rey, R. Lo, P. Dao, S.C. Sahinalp, M.Ester, L.J. Foster, F.S.L. Brinkman. PSORTb3.0: improved protein subcellular localiza-tion prediction with refined localization subcategories and predictive capabilities for allprokaryotes[J]. Bioinformatics,2010,26:1608-1615.
    [138] X.Q. Zheng, T.G. Liu, J. Wang. A complexity-based method for predicting proteinsubcellular location[J]. Amino Acids,2009,37:427-433.
    [139] K.C. Chou, Y.D. Cai. Using Functional Domain Composition and Support Vector Ma-chines for Prediction of Protein Subcellular Location[J]. J. Biol. Chem.2002,277:45765-45769.
    [140] Z.D. Lei, Y. Dai. Assessing protein similarity with Gene Ontology and its use in subnu-clear localization prediction[J]. BMC Bioinformatics,2006,7:491.
    [141] S.Y. Mei, W. Fei, S.G. Zhou. Gene ontology based transfer learning for protein subcel-lular localization[J]. BMC Bioinformatics,2011,12:44.
    [142] J.M. Chang, E.C.Y. Su, A. Lo, H.S. Chiu, T.Y. Sung, W.L. Hsu. PSLDoc: Proteinsubcellular localization prediction based on gapped-dipeptides and probabilistic latentsemantic analysis[J]. Proteins2008,72:693-710.
    [143] J. Guo, Y.L. Lin. TSSub: eukaryotic protein subcellular localization by extracting fea-tures from profiles[J]. Bioinformatics,2006,22:1784-1785.
    [144] P. Mundra, M. Kumar, K.K. Kumar, V.K. Jayaraman, B.D. Kulkarni. Using pseudoamino acid composition to predict protein subnuclear localization: Approached withPSSM[J]. Pattern Recognit. Lett.,2007,28:1610-1615.
    [145] H.B. Shen, K.C. Chou. Nuc-PLoc: a new web-server for predicting protein subnuclearlocalization by fusing PseAA composition and PsePSSM[J]. Protein Eng. Des. Sel.,2007,20:561-567.
    [146] R.Q. Xiao, Y.Z. Guo, Y.H. Zeng, H.F. Tan, X.M. Pu, M.L. Li. Using position specificscoring matrix and auto covariance to predict protein subnuclear localization[J]. J. Bio.Sci. Eng.,2009,2:51-56.
    [147] C.J. Shin, S. Wong, M.J. Davis, M.A. Ragan. Protein-protein interaction as a predictorof subcellular location[J]. BMC Syst. Biol.2009,3:28.
    [148] Q.H. Cui, T.Z. Jiang, B. Liu, S.D. Ma. Esub8: A novel tool to predict protein subcellularlocalizations in eukaryotic organisms[J]. BMC Bioinformatics,2004,5:66.
    [149] C. Guda, S. Subramaniam. TARGET: a new method for predicting protein subcellularlocalization in eukaryotes[J]. Bioinformatics,2005,21:3963-3969.
    [150] H.B. Shen, K.C. Chou. A top-down approach to enhance the power of predicting humanprotein subcellular localization: Hum-mPLoc2.0[J]. Anal. Biochem.,2009,394:269-274.
    [151] M.M. Zhou, J. Boekhorst, C. Francke, R.J. Siezen. LocateP: Genome-scale subcellular-location predictor for bacterial proteins[J]. BMC Bioinformatics,2008,9:173.
    [152] M. Carmo-Fonseca. The contribution of nuclear compartmentalization to gene regula-tion[J]. Cell,2002,108:513-521.
    [153] R. Hancock. Internal organisation of the nucleus: assembly of compartments by macro-molecular crowding and the nuclear matrix model[J]. Biol. Cell,2004,96:595-601.
    [154] G.E.S. Heidi, K.M. Gail, N. Kathryn, V.F. Lisa, F. Rachel, D. Graham, F.C. Javier,A.B. Wendy. Large-scale identification of mammalian proteins localized to nuclear sub-compartments[J]. Hum. Mol. Genet.,2001,10:1995-2011.
    [155] I. Dubchak, I. Muchanikt, S.R. Holbrook, S.H. Kim. Prediction of protein folding classusing global description of amino acid sequence[J]. Proc. Natl. Acad. Sci. U. S. A.,1995,92:8700-8704.
    [156] A. Lempel, J. Ziv. On the complexity of finite sequence[J]. IEEE Trans. Inf. Theory1976,22:75-81.
    [157] Z.R. Li, H.H. Lin, L.Y. Han, L. Jiang, X. Chen, Y.Z. Chen. PROFEAT: a web server forcomputing structural and physicochemical features of proteins and peptides from aminoacid sequence[J]. Nucleic Acids Res.,2008,34:W32-W37.
    [158] K.C. Chou. Prediction of protein subcellar locations by incorporating quasi-sequence-order effect[J]. Biochem. Biophys. Res. Commun.,2000,278:477-483.
    [159] S. Wold, J. Jonsson, M. Sjo¨stro¨m, M. Sandberg, S. Ra¨nnar. DNA and peptide sequencesand chemical processes multivariately modelled by principal component analysis andpartial least squares projections to latent structures[J]. Anal. Chim. Acta,1993,277:239-253.
    [160] L. Yang, Y.Z. Li, R.Q. Xiao, Y.H. Zeng, J.M. Xiao, F.Y. Tan, M.L. Li. Using auto co-variance method for functional discrimination of membrane proteins based on evolutioninformation[J]. Amino Acids,2010,38:1497-1503.
    [161] Y.H. Zeng, Y.Z. Guo, R.Q. Xiao, L. Yang, L.Z. Yu, M.L. Li. Using the augmentedChou’s pseudo amino acid composition for predicting protein submitochondria locationsbased on auto covariance approach[J]. J. Theor. Biol.,2009,259:366-372.
    [162] B.J.M. Webb-Robertson, K.G. Ratuiste, C.S. Oehmen. Physicochemical property dis-tributions for accurate and rapid pairwise protein homology detection[J]. BMC Bioinfor-matics,2010,11:145.
    [163] K. Mori, N. Kasashima, T. Yoshioka, Y. Ueno. Prediction of spalling on a ball bearing byapplying the discrete wavelet transform to vibration signals[J]. Wear1996,195:162-168.
    [164] K.A. Dill. Theory for the Folding and Stability of Globular Proteins[J]. Biochemistry1985,24:1501-1509.
    [165] Z.G. Yu, V. Anh, K.S. Lau. Fractal analysis of measure representation of large proteinsbased on the detailed HP model[J]. Physica A,2004,337:171-184.
    [166] J. Shen, J. Zhang, X. Luo, W. Zhu, K. Yu, K. Chen, Y. Li, H. Jiang. Predicting protein-protein interactions based only on sequences information[J]. Proc. Natl. Acad. Sci. U. S.A.,2007,104:4337-4341.
    [167] S. Alejandro, P. Ernesto, L. Segovia. Protein homology detection and fold inferencethrough multiple alignment entropy profiles[J]. Proteins2008,70:248-256.
    [168] L.R. Murphy, A. Wallqvist, R.M. Levy. Simplified amino acid alphabets for protein foldrecognition and implications for folding[J]. Protein Eng.,2000,13:149-152.
    [169] S. Basu, A. Pan, C. Dutta, J. Das. Chaos game representation of proteins[J]. J. Mol.Graph Model,1997,15:279-289.
    [170] T.J. Silhavy, S.A. Benson, S.D. Emr. Mechanisms of Protein Localization[J]. Microbiol.Rev.,1983,47:313-344.
    [171] J.Y. Yang, Y. Zhou, Z.G. Yu, V. Anh, L.Q. Zhou. Human Pol II promoter recognitionbased on primary sequences and free energy of dinucleotides[J]. BMC Bioinformatics2008,9:11.
    [172] G.S. Han, Z.G. Yu, V. Anh, R.H. Chan. Distinguishing coding from non-coding se-quences in a prokaryote complete genome based on the global descriptor. Proceedings ofThe6th International Conference on Fuzzy Systems and Knownledge Discovery:14-16August2009; Tianjin,2009:42-46.
    [173] H.H. Otu, K. Sayood. A new sequence distance measure for phylogenetic tree construc-tion[J]. Bioinformatics,2003,19:2122-2130.
    [174] S. Kawashima, M. Kanehisa. AAindex: Amino Acid index database[J]. Nucleic AcidsRes.,2000,28:374.
    [175] M. Bhasin, G.P.S. Raghava. ESLpred:SVM-based method for subcellular localization ofeukaryotic proteins using dipeptide composition and PSI-BLAST[J]. Nucleic Acids Res.,2004,32:W414-419.
    [176] K.C. Chou. Low-frequency collective motion in biomacromolecules and its biologicalfunctions[J]. Biophys. Chem.,1988,30:3-48.
    [177] Mallat SG. A theory for multiresolution signal decomposition: the wavelet representa-tion[J]. IEEE Trans. Pattern Anal. Mach. Intell.,1989,11:674-693.
    [178] A. Kandaswamy, C.S. Kumar, R.P. Ramanathan, S. Jayaraman, N. Malmurugan. Neuralclassification of lung sounds using wavelet coefficients[J]. Comput. Biol. Med.,2004,34:523-537.
    [179] S.P. Shi, J.D. Qiu, X.Y. Sun, J.H. Huang, S.Y. Huang, S.B. Suo, R.P. Liang, L. Zhang.Identify submitochondria and subchloroplast locations with pseudo amino acid compo-sition: Approach from the strategy of discrete wavelet transform feature extraction[J].Biochim. Biophys. Acta,2011,1813:424-430.
    [180] G. Dellaire, R. Farrall, W.A. Bickmore. The Nuclear Protein Database (NPD): subnu-clear localisation and functional annotation of the nuclear proteome[J]. Nucleic AcidsRes.,2003,31:328-330.
    [181] J. Wang, H.P. Lu, K.N. Plataniotis, J.W. Lu. Gaussian kernel optimization for patternclassification[J]. Pattern Recognit.,2009,42:1237–1247.
    [182] J.B. Yin, T. Li, H.B. Shen. Gaussian kernel optimization: Complex problem and a simplesolution[J]. Neurocomputing,2011,74:3816–3822.
    [183] T. Blum, S. Briesemeister, O. Kohlbacher. integrating phylogeny and Gene Ontologyterms improves subcellular protein localization prediction[J]. BMC Bioinformatics,2009,10:274.
    [184] J.A. Swets. Measuring the accuracy of diagnostic systems[J]. Science,1988,240:1285-1293.
    [185] A.P. Bradley. The use of the area under the ROC curve in the evaluation of machinelearning algorithms[J]. Pattern Recognit.,1997,30:1145-1159.
    [186] J.L. Gardy, M.R. Laird, F. Chen, S. Rey, C.J. Walsh, M. Ester, F.S. Brinkman. PSORT-b v.2.0: expanded prediction of bacterial protein subcellular localization and insightsgained from comparative proteome analysis[J]. Bioinformatics,2005,21:617-623.
    [187] L. Breman. Random forest[J]. Machine Learning,2001,45:5–32.
    [188] randomforest-matlab. Available: http://code.google.com/p/randomforestmatlab/.
    [189] M.N. Nguyen, J.C. Rajapakse. Prediction of protein relative solvent accessibility with atwo-stage SVM approach[J]. Proteins,2005,59:30–37.
    [190] M.N. Nguyen, J.C. Rajapakse. Prediction of Protein Secondary Structure with two-stagemulti-class SVMs[J]. Int. J. Data Min. Bioinform.,2007,1:248–269.
    [191] J. Gubbi, A. Shilton, M. Parker, M. Palaniswami. Protein topology classification usingtwo-stage support vector machines[J]. Genome Inform.,2006,17:259–269.
    [192] D.V. Nguyen, D.M. Rocke. Tumor classification by partial least squares using microar-ray gene expression data[J]. Bioinformatics,2002,18:39-50.
    [193] Y.X. Tan, L.M. Shi, W.D. Tong, C. Wang. Multi-class cancer classification by total prin-cipal component regression (TPCR) using microarray gene expression data[J]. NucleicAcids Res.,2005,33:56-65.
    [194] P.J. Deschavanne, A. Giron, J. Vilain, G. Fagot, B. Fertil. Genomic Signature: Char-acterization and Classification of Species Assessed by Chaos Game Representation ofSequences[J]. Mol. Biol. Evol.,1999,16(10):1391-1399.
    [195] S. Karlin, J. Mrazek, A.M. Campbell. Compositional biases of bacterial genomes andevolutionary implications[J]. J. Bacteriol.,1997,179(12):3899–3913.
    [196] H.J. Jeffrey. Chaos game representation of gene structure[J]. Nucleic Acids Res.,1990,18(18):2163–2170.
    [197] N. Goldman. Nucleotide, dinucleotide and trinucleotide frequencies explain patternsobserved in chaos game representations of DNA sequences[J]. Nucleic Acids Res.,1993,21(10):2487-2491.
    [198] J.S. Almeida, J.A. Carrico, A. Maretzek, P.A. Noble, M. Fletcher. Analysis of genomicsequences by Chaos Game Representation[J]. Bioinformatics,2001,17(5):429-437.
    [199] J. Joseph, R. Sasikumar. Chaos game representation for comparision of wholegenomes[J]. BMC Bioinformatics,2006,7(243):1-10.
    [200] P.E. Ve′lez, L.E. Garreta, E. Mart′nez, N. Díaz, S. Amador, I. Tischer, J.M. Gutie′rrez,P.A. Moreno. The Caenorhabditis elegans genome: a multifractal analysis[J]. Geneticsand Molecular Research,2010,9:949-965.
    [201] P.A. Moreno, P.E. Ve′lez, E. Mart′nez, L.E. Garreta, N. D′az, S. Amador, I. Tischer, J.M.Gutie′rrez, A.K. Naik, F. Tobar, F. Garc′a. The human genome: a multifractal analysis[J].BMC Genomics,2011,12:506.
    [202] A. Pandit, A.K. Dasanna, S. Sinha. Multifractal analysis of HIV-1genomes[J]. Molec-ular Phylogenetics and Evolution,2012,62:756–763.
    [203] A. Fiser, G.E. Tusnady, I. Simon. Chaos game representation of protein structures[J]. J.Mol. Graphics,1994,12(4):302-304.
    [204] S. Basu, A. Pan, C. Dutta, J. Das. Chaos game representation of proteins[J]. J. Mol.Graphics and Modelling,15(5):279-289.
    [205] Z.G. Yu, L. Shi, Q.J. Xiao, V. Anh. Simulation for chaos game representation ofgenomes by recurrent iterated function systems[J]. J. Biomedical Sci. and Eng.,2008,1(1):44-51
    [206] A. Reyes, G. Pesole, C. Saccone. Complete mitochondrial DNA sequence of the fatdormouse, Glis glis: further evidence of rodent parahyly[J]. Mol. Biol. Evol.,1998,15(5):499-505.
    [207] T.E. Dowling, C. Moritz, J.D. Palmer, L.H. Rieseber. Nucleic acids III: analysis of frag-ments and restriction sites[M]. Sinauer, Sunderland, Mass,1996.
    [208] D.D. Pollack, J.A. Eisen, N.A. Doggett, M.P. Cummings. A case for evolutionary ge-nomics and the comprehensive examination of sequence biodiversity[J]. Mol. Biol. Evol.,2000,17(12):1776–1788.
    [209] D. Sankoff, D. Bryantd, M. Deneault, B.F. Lang, G. Burger. Early Eukaryote EvolutionBased on Mitochondrial Gene Order Breakpoints[J]. J. Comput. Biol.,2000,7(3-4):521-535.
    [210] K.M. Wong, M.A. Suchard, J.P. Huelsenbeck. Alignment uncertainty and genomic anal-ysis[J]. Science,2008,319(5862):473–476.
    [211] G.A. Wu, S.R. Jun, G.E. Sims, S.H. Kim. Whole-proteome phylogeny of large dsD-NA virus families by an alignment-free method[J]. Proc. Natl. Acad. Sci. USA,2009,106(31):12826-12831.
    [212] G.W. Stuart, K. Moffet, J.J. Leader. A comprehensive vertebrate phylogeny using vectorrepresentations of protein sequences from whole genomes[J]. Mol. Biol. Evol.,2002,19(4):554-562.
    [213] J. Qi, B. Wang, B. Hao. Whole proteome prokaryote phylogeny without sequence align-ment: a K-string composition approach[J]. J. Mol. Evol.,2004,58(1):1-11.
    [214] Z.G. Yu, L.Q. Zhou, V. Anh, K.H. Chu, S.C. Long, J.Q. Deng. Phylogeny of prokaryotesand chloroplasts revealed by a simple composition approach on all protein sequencesfrom whole genome without sequence alignment[J]. J. Mol. Evol.,2005,60(4):538-545.
    [215] Z.G. Yu, X.W. Zhan, G.S. Han, R.W. Wang, V. Anh, K.H. Chu. Proper Distance Metricsfor Phylogenetic Analysis Using Complete Genomes without Sequence Alignment[J].Int. J. Mol. Sci.,2010,11(3):1141-1154.
    [216] N. Takezaki, M. Nei. Genetic distances and reconstruction of phylogenetic trees frommicrosatellite DNA[J]. Genetics,1996,144(1):389-399.
    [217] H.C. Causton, J. Quackenbush, A. Brazma. Microarray gene expression data analysis:A beginner’s guide[M]. Blackwell Science Ltd,2003.
    [218] N. Saitou, M. Nei. The neighbor-joining method: a new method for reconstructingphylogenetic trees[J]. Mol. Biol. Evol.,1987,4(4):406-425.
    [219] D.H. Huson, D. Bryant. Application of Phylogenetic Networks in Evolutionary Stud-ies[J]. Mol. Biol. Evo.,2006,23(2):254-267. Software: http://www.splitstree.org.
    [220] S.K. Mouchaty, A. Gullberg, A. Janke, U. Arnason. The phylogenetic position of theTalpidae within eutheria based on analysis of complete mitochondrial sequences[J]. Mol.Biol. Evol.,2000,17(1):60–67.
    [221] M. Nikaido, M.M. Harad, Y. Cao, M. Hasegawa, N. Okada. Monophyletic origin ofthe order chiroptera and its phylogenetic position among mammalia, as inferred from thecomplete sequence of the mitochondrial DNA of a japanese megabat, the ryukyu flyingfox Pteropus dasymallus[J]. J. Mol. Evol.,2000,51(4):318–328.
    [222] A.C. Reyes, C. Gissi, G. Pesole, F.M. Catzeflis, C. Saccone. Where do rodents fit?Evidence from the complete mitochondrial genome of Sciurus vulgaris[J]. Mol. Biol.Evol.,2000,17(6):979–983.
    [223] M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney, H. Zhang. An information-based se-quence distance and its application to whole mitochondrial genome phylogeny[J]. Bioin-formatics,2001,17(2):149-154.
    [224] R.R. Sokal, F.J. Rohlf. The comparison of dendrograms by objective methods[J]. Taxon,1962,11:33-40.
    [225] F.J. Rohlf, D.L. Fisher. Test for hierarchical structure in random data sets[J]. SystematicZool.,1968,17:407-412.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700