用户名: 密码: 验证码:
基于DNA序列4D表示的相似性分析与进化树算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着人类基因组计划的开展,以及各种生物基因序列的研究,产生了越来越多的分子序列数据。对这些序列数据进行科学的分析、处理推动了生物信息学的发展。随着基因序列的增长,基因序列的图形表达方法已成为研究基因序列的重要手段,如何给出有效的基因序列图形表达方式并在此基础上对基因的分类以及基因进化关系进行分析是生物信息学中一个热门课题。
     本文在DNA序列图形表示方法、生物序列的相似性分析及进化树构建算法方面进行研究。
     本文给出了一个关于DNA序列的图形表示方法较为详细的综述,其中先从简单的2维表示方法入手,然后在2维表示方法的基础上介绍了3维的图形表示方法,最后给出了高维的表示方法。然而,论文的中心内容并不仅限于图形表示方法的研究,而是从基于图形表示的数值特征向量去进行DNA序列之间的相似性分析。在接下来的四、五章中,给出了作者提出的基于核苷酸物理化学性质的新表示方法和序列之间相似性分析。以11种生物的β-globin基因的第一外显子为例分析了序列间的相似性,并同其它文献中的相似性分析的结果进行了比较。在文章最后,简单介绍了作为DNA序列比较的一个应用——构造系统发生树,提出了最大树模糊聚类方法构造系统发生树。
With the development of HGP (human genome project), the research on different species gene sequences, more and more molecular sequences data have been generated. The need to analyze, process these data accelerates the development of Bioinformatics. With the increasing of gene sequences, the graphical representation is becoming important for studing gene sequences. So how to give effective graphical representation of gene sequences, to classify genes, and to study the phylogenetic relationships are the important problems in Bioinformatics.
     This dissertation mainly studied the graphical representation of DNA sequence, the similarity analysis of biological sequences and the algorithm for constructing the phylogenetic tree.
     In this paper, we first presented on graphical representations of the DNA sequence in more detail, in which we start with a simple two-dimensional graphical representation method, and then based on the two-dimensional representation put forward three-dimensional graphic representation, and finally give a high-dimensional representation. However, the focus of this thesis is not only to introduce the methods of study of graphical representations but also to describe several numerical analysis methods based on graphical representations for the comparison of different DNA sequences. In the next four to five chapters, the author proposed a new graphical representation method based on the nucleotide physical and chemical properties, and similarity analysis of DNA sequences. Moreover, we illustrated the method by examining similarity or dissimilarity of exon-1 ofβ-globin gene of 11 species, and compared our results with some existing results of other methods. Finally, we briefly introduced one application of comparison of the DNA sequence -constructing phylogenetic tree. Lastly we propose a fuzzy clustering method of largest tree structing phylogenetic tree.
引文
[1]于成龙.DNA序列的图形表示及其相似性分析:[浙江大学硕士学位论文].杭州:浙江大学,2006.
    [2]钟扬,赵亮,赵琼.简明生物信息学.北京:高等教育出版社,2002.
    [3]张惜珍.DNA序列3D图形表示及进化树算法研究.:[湖南大学硕士学位论文].长沙:湖南大学,2007.
    [4]J.W.Han,M.Kamber.Data Mining:Concepts and Techniques.Simon Fraser University:Morgan Kaufmann Publishers,2000,12-20,236-245.
    [5]张春霆.生物信息学的现状与展望.世界科技研究与发展,2000,22(6):17-20.
    [6]张春霆.用几何学方法分析DNA序列.中国科学基金,1999,13(3):152-153.
    [7]M.Nei,S.Kumar.分子进化与系统发育.吕宝忠,钟扬,高莉萍等译.第1版.北京:高等教育出版社,2004,168-189.
    [8]鲁卫平,周元国.生物信息学的现状和展望.国外医学临床生物化学与检验学分册,2002,23(5):254-255,274.
    [9]Famili A,Shen R,Webber et al.Data Preprocessing and Intelligent Data Analysis.J.Intel.Data Analy,1(1997):32-23.
    [10]唐南南.生物序列的图形表示及系统发生分析:[大连理工大学硕士学位论文].大连:大连理工大学,2006.
    [11]M.A.Gates.A simple way to look at DNA.J.Theo.Bio,119(1986):319-328.
    [12]A.Nandy.A new graphical representation and analysis of DNA sequence structure I.Methodology and application to globin genes.Curr.Sci,.66(1994):309-313.
    [13]P.M.Leong,S.Morgenthaler.Random walk and gap plots of DNA sequences.Comput.Appl.Biosci.11(1995):503-507.
    [14]X.Guo,M.Randic,S.C.Basak.A Novel 2-D Graphical Representation of DNA Sequences of Low Degeneracy.Chem.Phys.Lett,350(2001):106-112.
    [15]Y.C.Liu,X.F.Guo,J.Xu,L.Q.Pan et al.Some Notes on 2-D Graphical Representation of DNA Sequence.J.Chem.Inf.Comput.Sci,42(2002):529-533.
    [16]Y.Wu,A.W.Liew,H.Yan,M.Yang.DB-Curve:a novel 2D method of DNA sequence visualization and representation.Chem.Phys.Lett.367(2003):170-176.
    [17]S.S.S.Yau,J.Wang,A.Niknejad,C.Lu,N.Jin,Y.Ho.DNA sequence representation without degeneracy.Nucl.Acids.Res.31(2003):3078-3080.
    [18] M. Randic, M. Vracko, N. Lers, D. Plavsic, Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation. Chem. Phys. Lett. 371(2003):202-207.
    
    [19] B. Liao. A 2D graphical representation of DNA sequence. Chem. Phys. Lett.401(2005): 196-199.
    
    [20]Z. H. Qi, X. Q, Qi, Novel 2D graphical representation of DNA sequence based on dual nucleotides. Chem. Phys. Lett. 440(2007): 139-143.
    
    [21] D. B. Waz, T. Chark, P. Waz, W. Nowak, A. Nandy. 2D-dynamic representation of DNA sequences. Chem. Phys. Lett. 442(2007): 140-144.
    
    [22] M. Randic, Condensed representation of DNA primary sequences. J. Chem. Inf. Comput. Sci. 40 (2000): 50-56.
    
    [23] M. Randic, X.F. Guo, S.C. Basak, On the characterization of DNA primary sequences by triplet of nucleic acid bases. J. Chem. Inf. Comput. Sci. 41(2001):619-626.
    
    [24]M. Rancic, M. vracko, J. Zupan, M. Novis, Compact 2-D graphical representation of DNA Chem. Phys. Lett. 373(2003): 558-562.
    
    [25] M. Randic, M. Vracko, N. Lers, D. Plavsic, Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem. Phys. Lett.368(2003) :1-6.
    
    [26]M. Randic. Graphical representation of DNA as 2-D map. Chem. Phys. Lett. 386(2004): 468-471.
    
    [27] J. Song, H. W. Tang. A new 2-D graphical representation of DNA sequences and their numerical characterization. J. Biochem. Biophys. Methods. 63(2005): 228-239.
    
    [28]M. Randic, N. Lers, D. Plavsic, S. C. Basak, A. T. Balaban. Four-color map representation of DNA or RNA sequences and their numerical characterization. Chem.Phys. Lett. 407(2005): 205-208.
    
    [29] Y. Zhang, B. Liao, K. Ding, On 2D representation of DNA sequence of nondegeneracy. Chem. Phys. Lett. 411(2005):28-32.
    
    [30]Y. H. Yao, X. Y. Nan, T. M. Wang. A new 2D graphical representation-Classification curve and the analysis of similarity/dissimilarity of DNA sequences. J. Mol. Struct. (Theochem) 764(2006): 101-108.
    
    [31]Q. Dai, X. Q. Liu, T. M. Wang. A novel 2D graphical representation of DNA sequences and its application. J. Mol. Graph. Mod. 25(2006): 340-344.
    
    [32]X. Q. Liu, Q. Dai, Z. L. Xiu, T. M. Wang, PNN-curve: A new 2D graphical representation of DNA sequences and its application. J. Theor. Bio. 243(2006):555-561.
    
    [33]F. L. Bai, Y. Z. Liu, T. M. Wang, A representation of DNA primary sequences by random walk. Math. Biosci. 209(2007):282-291.
    
    [34]D. B. Waz, W. Nowak, P. Waz, A. Nandy, T. Clark, Distribution moments of 2D-graphs as descriptors of DNA sequences. Chem. Phys. Lett. 443(2007): 408-413.
    
    [35]E. Hamori, Novel DNA sequence representations. Nature(London). 314(1985):585-586.
    
    [36]E. Hamori, J. Ruskin, H curves, A novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem. 258(1983):1318-1327.
    
    [37] C. T. Zhang, R. Zhang, H. Y. Ou. The Z curve database: a graphic representation of genome sequences. Bioinformatics, 19 (2003): 593-599.
    
    [38] B. Liao, W. Zhu, Y. Liu. 3D graphical representation of DNA sequence without degeneracy and its application in constructing phylogenetic tree. MATCH Commun.Math. Comput. Chem. 56 (2006): 209-216.
    
    [39] M. Randic, M. Vracko, A. Nandy, and S. C. Basak. On 3-D graphical representation of DNA primary sequences and their numerical characterization. J.Chem. Inf. Comput. Sci. 40(2000) 1235-1244.
    
    [40]C. Li, J. Wang. On a 3-D representation of DNA primary sequences. Comb.Chem. High Throughput Screen. 7(2004): 23-27.
    
    [41] B Liao. Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation. J. Mol Struct: Theochem, 717 (2005): 199-203
    
    [42] B. Liao. 3D graphical representation of DNA sequences and their numerical characterization. J. Mol Struct: Theochem, 681 (2004): 209-212.
    
    [43] C. X. Yuan, B Liao, T. M. Wang. New 3D graphical representation of DNA sequences and their numerical characterization. Chem. Phys. Lett. 379(2003): 412-417.
    
    [44] Y. Zhang, B. Liao, K. Ding. On 3DD-curves of DNA sequences. Mol. Simul.32 (2006): 29-34.
    
    [45] Z. H. Qi, T. R. Fan, PN-curve: A 3D graphical representation of DNA sequences and their numerical characterization. Chem. Phys. Lett. 442(2007) :434-440.
    
    [46] B. Liao, T. M. Wang, Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases, J. Chem. Inf. Comput. Sci. 44(2004):1666-1670.
    [47]B.Liao,M.S.Tan,K.Q.Ding,A 4D representation of DNA sequences and its application,Chem.Phys.Lett.402(2005):380-383.
    [48]R.Chi,K.Q.Ding,Novel 4D numerical representation of DNA sequences.Chem.Phys.Lett.407(2005):63-67.
    [49]B.Liao,R.F.Li,W.Zhu,On the similarity of DNA primary sequences based on 5-D representation,J.Math.Chem.42(2006):47-57.
    [50]张春霆.生物信息学——重大科学意义与经济效益兼备的新学科.中国科学基金,1999,2:65-68.
    [51]C.T.Zhang.A Symmetrical Theory of DNA Sequences and Its Applications.J.theor.Bio,187(1997):297-306.
    [52]M.Randic.On 3-D Graphical Representation of Proteomics Maps and Their Numerical Characterization.J.Chem.Inf.Comput.Sci.41(2001):1339-1344.
    [53]C Raychaudhury,A Nandy.Indexing Scheme and Similarity Measures for Macromolecular Sequences.J.Chem.Inf.Comput.Sci.39(1999):243-247.
    [54]B.Liao,T.M.Wang.New 2D Graphical representations of DNA Sequences.J.Comput.Chem.,25(2004):1364-1368.
    [55]B.Liao.3-D graphical representation of DNA sequences and their numerical characterization.J.Mol.Struct.681(2004):209-212
    [56]C.X.Yuan,B Liao,T.M.Wang.New 3D graphical representations of DNA sequences and their numerical characterization.Chem.Phys.Lett.379(2003):412-417.
    [57]Y.Zhang,B.Liao,K.Ding.On 3DD-curves of DNA sequences.Mol.Simula.32(2006):29-34.
    [58]姚玉华.生物序列相似性分析的图形表示及其不变量方法:[大连理工大学硕士学位论文].大连:大连理工大学,2006.
    [59]张敏.生物序列比对算法研究现状与展望.大连大学学报,2004,25(4):75-79.
    [60]S.B.Needleman,C.D.Wunsch.A General method applicable to the search for similarities in the amino acid sequence of two proteins.J.Mol.Biol,48(1970):432-453.
    [61]谭严芳.一种基于NJ的高效构建系统进化树算法.计算机工程与应用,21(2004):84-86.
    [62]M.Randic,M.Vracko.On the Similarity of DNA Primary Sequences.J.Chem.Inf.Comput.Sci.40(2000):599-606.
    [63]M.Randic.On characterization of DNA primary sequences by a condensed matrix.Chem.Phys.Lett.317(2000):29-34.
    [64]M.Randic.Graphical representations of DNA as 2-D map.Chem.Phys.Lett.,386,(2004):468-471.
    [65]W.X.Zheng.Coronavirus phylogeny based on a geometric approach.Mol.Phylog.Evol,36(2005):224-232
    [66]B.Liao,T.M.Wang.New 2D Graphical Representation of DNA Sequences.J.Comput.Chem.25,(2004):1364-1368
    [67]C.X.Yuan,B Liao,T.Wang.New 3D graphical representation of DNA sequences and their numerical characterization.Chem.Phys.Lett.,379,(2003):412-417
    [68]C.Li,J.Wang.New Invariant of DNA Sequences.J..Chem.Inform.Model,45,(2005):115-120.
    [69]Y.Zhang,W.Chen.Invariant of DNA sequences based on 2DD-curves.Journal of Theoretical Biology,242,(2006):382-388
    [70]朱浩译,计算分子生物学导论,科学出版社,2003.
    [71]常青,周开亚,分子进化研究中系统发生树的重建,生物多样性6(1)(1998)55-62.
    [72]J.Felsentein,Phylogenies from molecular sequences:inference and reliability J.Annual Review.Genetic,22(1988)521-565.
    [73][美]根井正利,库马著(吕宝忠,钟扬,高莉萍译,赵寿元,张建之校).分子进化与系统发生[M].北京:高等教育出版社,2002.
    [74]Y.Pauplin.Direct calculation of a tree length using a distance matrix.J.Mol.Evol.51(2000):41-47.
    [75]D.Thomas.Example calculation of phylogenies:the UPGMA method[EB/OL].http://www.nmsr.org/upgma.htm,2002210231/2005210210.
    [76]Chris.Fitch2Margoliash algorithm for calculating the branch lengths[EB/OL].http://www.bioinfo.rpi.edu/~bystrc/courses/bio14540/lecture12/sld002.htm,2004212210/2005210210.
    [77]N.Saitou,M.Nei.The neighbor2joining method:a new method for reconstructing phylogenetic trees.J.Mol Biol Evol.4(1987):406-425.
    [78]S.Roch.A short proof that phylogenetic tree reconstruction by maximum likelihood is Hard.ACM Transactions on Computational Biology and Bioinformatics,3(2006):92-94.
    [79]E Sober.Reconstructing the past:parsimony,evolution and inference[M]. Cambridge:MIT press.1988.
    [80]J.Sourdis,M.Nei.Relative efficiencies of the maximum parsimony and distance matrix methods in obtaining the correct phylogenetic tree.J.Mol Biol.Evol,5(1988):298-311.
    [81]M.Holder,P.O.Lewis.Phylogeny estimation:traditional and Bayesian approaches.J.Nature Reviews Genetics,4(2003):275-284.
    [82]W.H.Li.Evolutionary change of restriction cleavage sites and phylogenetic inference[J].Genetics,113(1986):187-213.
    [83]L.L.Cavalli Sforza,A W Edwards.Phylogenetic analysis:models and estimation procedures.J.Hum.Geneti,19(1967):233-257.
    [84]J Felsenstein.Evolutionary trees from DNA sequences:a maximum likelihood approach[J].Journal of Molecular Evolution,1981,17(6):368-376.
    [85]谢季坚,刘承平.模糊数学方法及其应用(第三版).武汉:华中科技大学出版社,2007.
    [86]B Liao,Xinzhou Shan,Wen Zhu,Renfa Li,Phylogenetic tree construction hansed on 2D graphical representation.Chenical Physics Letters.422(2006)282-288.
    [87]W.Wang,B Liao,T.Wang,W.Zhu,A graphical method to construct a phylogenetic tree.Inter Science.106(2006):1998-2005

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700