用户名: 密码: 验证码:
RNA二级结构的计数问题及其进化分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
RNA是由A、C、G、U四种不同的核苷酸组成的单链。RNA通过自身回折,使链中的一部分核苷酸与其它部分核苷酸互补配对,形成RNA二级结构。RNA二级结构的计数研究是计算分子生物学的重要课题之一。由于RNA二级结构经常被抽象为离散的数学对象,从而使得离散数学和分子生物学密切联系在一起。一方面,组合技巧成功的应用到了RNA二级结构的计数问题中;另一方面,RNA二级结构的计数问题启发了新的有趣的组合问题。另外,人类基因组计划的实现产生了大量的数据,如何选择有效的方法从这些数据中提取信息,进而分析物种间的进化关系,将面临着巨大的挑战。本文主要研究了RNA二级结构的组合计数问题及其进化分析,主要内容如下:
     一、详细介绍了RNA二级结构的基本信息,主要包括二级结构的各组成元素以及各种传统的表示形式,并且利用发生函数的方法讨论了限制端环长度的RNA二级结构的计数问题。另外,给出了一种计算S_m(n)的方法。
     二、为了化简限制端环长度的RNA二级结构的递推公式,建立了二级结构与组合数学中三种特殊的集合间的一一对应。通过建立的双射关系,得到了关于限制端环长度至少为m且有k个基对的二级结构数S_m(n,k)的一个封闭和式。
     三、按照Watson-Crick碱基配对原则,用圈表示A(U)而用点来表示G(C),提出了RNA二级结构一种新的表示形式。这种表示比传统的表示形式更为合理。在此基础之上,研究了以端环长度为参数且带有各种限制条件的二级结构的计数问题,并且进一步研究了同时选取端环和堆积的长度作为参数的二级结构的渐近计数问题.
     四、将两组复杂的RNA二级结构分别转换成定义在20个字符上的线性符号序列,计算出其LZ复杂性,进而基于两种不同的算法构造了进化树,结果充分证明了我们方法的有效性。
An RNA molecule is a single-stranded chain consisting of the nucleotides A, U, C, G. A nucleotide in one part of the molecule can be paired with a complementary nucleotide in another part, and thus the molecular folds to form secondary structures. Today, the research on the enumeration of RNA secondary structures is one of the hot topics in Computational Molecular Biology. RNA secondary structures are usually abstracted to discrete mathematic objects, which establishes a connection between Discrete Mathematics and Computational Molecular Biology. On the one hand, the skills of combinatorics enumeration have been successfully applied to the enumeration problems of RNA secondary structures. On the other hand, the enumeration problems of RNA secondary structures have inspired some new interesting combinatorial questions. In addition, there are abundant of data during the implement of Human Project. And it is a great challenge to analyze the phylogenetic relations among different species by choosing valid methods used to extract essential information. The core of this thesis is RNA secondary structures, and we mainly discusses the enumeration problems and the evolution relations. The main contents of this thesis can be summarized as follows:
     Chapter 2 describes some information of the RNA secondary structure in detail, including some elements of RNA secondary structures and various traditional representations, and makes a further discussion about the enumeration problem of RNA secondary structures with limited length of each loop using its generating function. Moreover, a method to compute S_m(n) is given.
     In order to specify the recurrence relations of RNA secondary structures with limited length of each loop, we establish one to one correspondences between the sets of secondary structures and three special sets in combinatorics in chapter 3. And an exact expression about secondary structures with k base pairs whose loop has at least m bases is obtained by one bijection.
     In chapter 4, a new representation is obtained based on the Watson-Crick principle, i.e., let circles to represent the bases A(U) and dots to represent the bases G(C). The representation is more reasonable and meaningful than the traditional representations. Based on the new representation, we make a new discussion by choosing the least length of loops and stacks as parameters.
     In the final chapter, we transform complex RNA secondary structures into linear symbolic sequences defined in 20 alphabet, and compute their LZ complexities. Furthermore, we obtain the phylogenetic trees using two different programmes. The results adequately indicate the validity of our method.
引文
[1]张德阳编著,生物信息学,北京:科学出版社,2004.
    [2]卢向阳主编,分子生物学,北京:中国农业出版社,2004.
    [3]金由辛编,核糖核酸与核糖核酸组学,北京:科学出版社,2005.
    [4]叶林柏等编著,基础分子生物学,北京:科学出版社,2004.
    [5]I.Tinoco,Jr.and C.Bustamante,How RNA folds,J.Mol.Biol.,293(1999) 271-281.
    [6]J.A.Doudna,Structural genomics of RNA,Nat.Struct.Biol.,7(2000) 954-956.
    [7]J.Couzin,Breakthrough of the year:Small RNAs make big splash,Science,298(2002) 2296-2297.
    [8]李建会,人类基因组研究的价值和社会伦理问题,自然辩证法研究,1(2001)24-28.
    [9]S.Y.Le,R.Nussinov,J.V.Mazel,Tree graphs of RNA secondary structures and their comparsions,Comput.Biomed.Res.,22(1989) 461-473.
    [10]L.J.Collins,V.Moulton,D.Penny,Use of RNA secondary structure for studying the evolution of RNase P and RNase MRP,J.Mol.Evol.,51(2000) 194-204.
    [11]S.Dulucq,L.Tichit,RNA secondary structure comparison:exact analysis of the ZhangShasha tree edit algorithm,Theor.Comput.Sci.,306(2003) 471-484.
    [12]I.L.Hofacker,S.H.Bernhart,P.F.Stadler,Alignment of RNA base pairing probability matrices,Bioinformatics,20(2004) 2222-2227.
    [13]N.Liu,T.M.Wang,A method for rapid similarity analysis of RNA secondary structures,BMC Bioinformatics,7(2006) 493-503.
    [14]Q.Dai,T.M.Wang,Use of statistical measures for analyzing RNA secondary structures,J.Comput.Chem.,29(2008) 1292-1305.
    [15]P.R.Stein,M.S.Waterman,On some new sequences generalizing the Catalan and Motzkin numbers,Discrete Math.,26(1978) 261-272.
    [16]M.Régnier,Generating functions in computational biology,invited to MABS'97,1998.
    [17]I.L.Hofacker,P.Schuster,P.F.Stadler,Combinatorics of RNA secondary structures,Discrete Appl.Math.,88(1998) 207-237.
    [18]C.Dennis,The brave new world of RNA,Nature,418(2002) 122-124.
    [19]廖波,王天明,一类RNA二级结构的计数,应用数学,15(2002)109-112.
    [20]C.Flamm,I.L.Hofacker,P.F.Stadler,Computational chemistry with RNA secondary structures,Kemija u industriji,53(2004) 315-322.
    [21]M.Vauchaussade de Chaumont,X.G.Viennot,Enumeration of RNA secondary structures by complexity,Math.Med.Biol.Lecture Notes Biomath.,57(1985) 360-365.
    [22]T.Doslic,D.Svrtan,D.Veljan,Enumerative aspects of secondary structures,Discrete Math.,285(2004) 67-82.
    [23]D.Barash,Spectral decomposition for the search and analysis of RNA aecondary atructure,J.Comput.Biol.,11(2004) 1169-74.
    [24]G.Vernizzi,H.Orland,A.Zee,Enumeration of RNA structures by matrix models,Phys.Rev.Lett.,94(2005) 168103.
    [25]D.E.Krane,M.L.Raymer著,孙啸,陆祖宏,谢建明等译,生物信息学概论,清华大学出版社,2004.
    [26]M.S.Waterman,T.F.Smith,RNA secondary structure:A complete mathematical analysis,Math.Biosic.,42(1978) 257-266.
    [27]M.Zuker,D.Sankoff,RNA secondary structures and their prediction,Bull.Math.Biol.,46(1984)591-621.
    [28]A.Nkwanta,Lattice paths and RNA secondary structures,DIMACS Series in Discrete Mathematics and Theoretical Computer Science,34(1997) 137-147.
    [29]B.Liao,T.M.Wang,General combinatorics of RNA secondary structure,Math.Biosci.,191(2004)69-81.
    [30]C.Haslinger,P.F.Stadier,RNA structures with pseudo-knots:Graph-theoretical and combinatorial Properties,Bull.Math.Biol.,61(1999) 437-467.
    [31]M.S.Waterman,Secondary structure of single-stranded nucleic acids,Adv.Math.Suppl.Stud.,1(1978) 167-212.
    [32]M.S.Waterman,T.F.Smith,Rapid dynamic programming algorithms for RNA secondary structure,Adv.Appl.Math.,7(1986) 455-464.
    [33]J.S.McCaskill,The equilibrium partition function and base pair binding probilities for RNA secondary structure,Biopolymers,29(1990) 1105-1119.
    [34]M.Traker,W.Fontana,P.F.Stadler,P.Schuster,Statistics of RNA melting kinetics,Eur.Biophys.J.,23(1994) 29-38.
    [35]J.A.Howell,T.F.Smith,M.S.Waterman,Computation of generating function for biological molecules,SIAM J.Appl.Math.,39(1980) 119-133.
    [36]M.E.Nebel,Combinatorial properties of RNA secondary structures,J.Comput.Biol.,9(2002)541-573.
    [37]W.R.Schmitt,M.S.Waterman,Linear trees and RNA secondary structure,Discrete Appl.Math.,51(1994) 317-323.
    [38]R.Willenbring,RNA secondary structure,Permutations and Statistics,submitted.
    [39]R.Stanley,Enumerative combinatorics,vol.2,Cambridge University Press,1999.
    [40]Ch.L.Liu,RNA secondary structure and bioloured ordered trees,Ars Combin.,81(2006) 305-309.
    [41]A.B(o|¨)ck,K.Forschhammer,J.Heider,et al,Selenoprotein synthesis:an expansion of the genetic code,Trends Biochem.Sci.,16(1991) 463-467.
    [42]J.Heider,C.Baron,A.B(o|¨)ck,Coding from a distance dissection of the mRNA determinants required for the incorporation of selenocysteine into protein,EMBO J.,11(1992) 3759-3766.
    [43]S.Moon,Y.Byun,H.J.Kim,et al,Predicting genes expressed via -1 and +1 frameshifts,Nucleic Acids Res.,32(2004) 4884-4892.
    [44]L.P.Lim,M.E.Glasner,S.Yekta,et al,Vertebrate microRNA genes,Science,299(2003) 1540.
    [45]T.Tuschl,Functional genomics:RNA sets the standard,Nature,421(2003) 220-221.
    [46]M.Zuker,On finding all suboptimal foldings of an RNA molecule,Science,244(1989) 48-52.
    [47]V.Moulton,M.Zuker,M.Steel,et al.,Metrics on RNA secondary structures,J.Comput.Biol.,7(2000) 277-292.
    [48]M.Zuker,J.A.Jaeger,D.H.Turner,A comparison of optimal and suboptimal RNA secondary structures predicted by free energy minimization with structures determined by phylogenetic comparison,Nucleic Acids Res.,19(1991) 2707-2714.
    [49]B.Liao,T.M.Wang,A 3D Graphical Representation of RNA Secondary Structures,J.Biomol.struct.Dyna.,21(2004) 827-832.
    [50]B.Shapiro,K.Zhang,Comparing multiple RNA secondary structures using tree comparisons,Comput.Appl.Biosci.,6(1990) 309-318.
    [51]C.Chevalet,B.Michot,An algorithm for comparing RNA secondary structures and search for similar substructures,Comput.Appl.Biosci.,8(1992) 215-225.
    [52]I.L.Hofacker,W.Fontana,P.F.Stadler,S.Bonhoeffer,M.Tacker,P.Schuster,Fast folding and comparison of RNA secondary structures,Monatshefte f.Chemie,125(1994) 167-188.
    [53]K.Zhang,Computing similarity between RNA secondary structures,In Proceedings of the IEEE International Joint Symposia on Intelligence and Systems,(1998) 126-132.
    [54]V.Bafna,S.Muthukrishnan,R.Ravi,In Proceedings 6th Symp.on Combinatorial Pattern Matching CPM-95,1995.
    [55]C.Li,A.H.Wang,L.L.Xing,Similarity of RNA secondary structures,J.Comput.Chem.,28(2007) 508-512.
    [56]塞图宝,梅丹尼斯著,朱浩等译,计算分子生物学导论,北京:科学出版社,2003.
    [57]A.Condon,B.Davy,B.Rastegari,F.Tarrant,S.Zhao,Classifying RNA pseudoknotted structures,Theor.Comput.Sci.,320(2004) 35-50.
    [58]R.B.Lyngso,C.Pedersen,Pseudoknots in RNA secondary structures,in:Proceedings of the Fourth Annual International Conference on Computational Molecular Biology(RECOMB' 2000),8-11 April,Tokyo,Japan,pp.201-209.
    [59]P.Schuster,P.F.Stadler,Discrete models of biopolymers,in:J.Crabbe,A.Konopka,M.Drew (Eds.).Handbook of Computational Chemistry and Biology,Mareel Dekker Inc.
    [60]刘次全,曹恩华,白春礼等著,核酸结构多态性,北京:高等教育出版社,2000.
    [61]W.Y.C.Chen,E.Y.P.Deng,R.R.X.Du,Reduction of m-regular noncrossing partitions,European J.Combin.,26(2005) 237-243.
    [62]G.E.Andrews,The theory of partitions,Reading,Mass.:Addison-Wesley Pub.Co.,Advanced Book Program,1976.
    [63]R.Simion,D.Ullman,On the structure of the lattice of noncrossing partitions,Discrete Math.,98(1991) 193-206.
    [64]M.Klazar,On abab-free and abba-free set partitions,European J.Combin.,17(1996) 53-68.
    [65]王振宇著,树的枚举与算法复杂性分析,北京:国防工业出版社,1991.
    [66]Ch.W.Song,The Generalized Schr(o|¨)der Theorey,Electron.J.Combin.12(2005) 53-62.
    [67]M.Go,Statistical mechanics of biopolymers and its application to the melting transition of polynu- cleotides, J. Phys. Soc. Japan., 23 (1967) 597-608.
    [68] M. E. Nebel, Investigation of the Bernoulli model for RNA secondary structures, Bull. Math. Biol., 66 (2004) 925-964.
    
    [69] E. A. Bender, Asymptotic methods in enumeration, SIAM Rev. 16 (1974) 485-515.
    [70] E. R. Canfield, Remarks on an asymptotic method in combinatorics, J. Combin. Theory A 37 (1984) 348-352.
    [71] A. Meir, J. W. Moon, On an asymptotic method in enumeration, J. Combin. Theory A, 51 (1989) 77-89.
    [72] V. D. Gusev, L. A. Nemytikova, N. A. Chuzhanova, On the complexity measures of genetic sequences, Bioinformatics, 15 (1999) 994-999.
    [73] A. Lempel, J. Ziv, On the complexity of finite sequences, IEEE Trans. Inform. Theory, 22 (1976) 75-81.
    [74] X. Chen, S. Kwong, M. Li, A compression algorithm for DNA sequences and its applications in genome comparison, Genome Inform. Ser. Workshop Genome Inform., 10 (1999) 51-61.
    [75] T. Uyematsu, Lempel-Ziv coding as a tool for information theory, IEICE Trans. Fundamentals (Japanese Edition), 84 (2001) 681-690.
    [76] B. Li, et al. LZ complexity distance of DNA sequences and its application in phylogenetic tree reconstruction, Geno. Prot. Bioinformatics, 3 (2005) 206-212.
    
    [77] H. H. Otu, K. Sayood, A new sequence distance measure for phylogenetic tree construction, Bioinformatics, 19 (2003) 2122-2130.
    [78] J. R. Chamberlain, E. Pagan-Ramos, D. W. Kindelberger, D. R. Engelke, An RNase P RNA subunit mutation affects ribosomal RNA processing, Nucleic Acids Res., 24 (1996) 3158-3166.
    [79] Y. H. Park, H. Hori, K. Suzuki, S. Osawa, K. Komagata, Phylogenetic analysis of the coryneform bacteria by 5S rRNA sequences, J. Bacteriol, 169 (1987) 1801-1806.
    [80] M. Schmitt, J. Bennett, D. Dairaghi, Secondary structure of RNase MRP RNA as predicted by phylogenetic comparison, FASEB J., 7 (1993) 208-213.
    
    [81] J. Felsenstein, PHYLIP-phylogeny inference package (version 3.2), Cladistics, 5 (1989) 164-166.
    [82] R. D. M. Page, TREEVIEW: An application to display phylogenetic trees on personal computers, Comput. Appl. Biosci., 12 (1996) 357-358.
    [83] R. Reddy, S. Shimba. Structural and functional similarities between MRP and RNase P, Mol. Biol. Rep., 22 (1996) 81-85.
    [84] J. P. Morrissey, D. Tollervey, Birth of the snoRNPs: the evolution of RNase MRP and the eukaryotic pre-rRNA-processing system, TIBS, 20 (1995) 78-82.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700