用户名: 密码: 验证码:
带假结的RNA二级结构预测算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着生物信息学的发展和对RNA研究的深入,RNA已经不仅是从DNA到蛋白质的信息传递者,在RNA病毒和某些动物细胞中,RNA还是遗传信息的载体,控制蛋白质的合成,甚至在某些癌细胞和动物胚胎细胞中,可以由RNA转录出DNA。RNA的功能由RNA的结构所决定,对RNA二级结构预测算法的研究已经成为生物信息处理的研究重点。假结是RNA中一种复杂的二级结构,同时假结决定了一些重要的生物功能,因此对带假结的RNA二级结构预测算法的研究是RNA二级结构预测算法研究中的热点。
     首先,本文提出了基于动态权重的RNA二级结构预测的遗传算法。动态权重是最大权重的改进,最大权重是最大堆迭的改进,本文在基于最小自由能的RNA二级结构预测的遗传算法的基础上,用动态权重模型替换了最小自由能模型。实验表明,基于动态权重的RNA二级结构预测的遗传算法不仅能够预测假结,而且还能够预测假结中较复杂的非平面假结。
     其次,本文提出了基于快速动态权重匹配的RNA二级结构预测算法。该算法不仅与动态权重匹配算法一样对某些特定的RNA进行二级结构预测有着很高的准确率和O(n2)的理想空间复杂度,而且在动态权重算法的基础上得到了很好的改进。一是通过引入了最大动态权重茎区快速搜索算法,使得算法的时间复杂度由动态权重匹配算法的O(n3logn)降到了新算法的O(n3);二是扩充了对RNA中假结进行搜索的范围,实验表明,与动态权重匹配算法相比,基于快速动态权重匹配的RNA二级结构预测算法能够预测更多可能存在的假结。
With development of research on bioinformatics and RNA, RNA is no longer simply considered as an information intermediary from DNA to proteins. In RNA viruses and some animal cells, RNA serves as the carrier of germ plasma and control the combining of proteins. Moreover, in some cancer cells or embryo cells, RNA can even transfer into DNA. The functions of RNA are determined by its structures, and research on RNA secondary structure prediction algorithms has become a keystone in the domain of biological information processing. As a kind of complex secondary structure in RNA, pseudoknots determine some important biological functions. So it is a hotspot to study the RNA secondary structure prediction algorithm including pseudoknots in the research of RNA secondary structure prediction algorithm.
     First of all, this thesis advances a genetic algorithm of RNA secondary structure prediction based on dynamic weigh. Dynamic weight results from improvement of maximal weight, which comes from improvement of maximal stack. On the basis of RNA secondary structure prediction genetic algorithm based on free energy minimization, this thesis replaces the free energy minimization model with the dynamic weight model. As indicated by the experiments, the new algorithm can not only predict pseudoknots, but can predict complicated non-plane pseudoknots as well.
     Second, this thesis proposes RNA secondary structure prediction algorithm based on fast dynamic weighted matching. This algorithm, on the one hand, has the same great accuracy and ideal space complexity of O(n6) as dynamic weighted matching algorithm does. On the other hand, it has been improved on the basis of dynamic weighted matching algorithm. That is, first, by importing fast searching of max dynamic weighted stem algorithm, time complexity of dynamic weighted matching algorithm, which is O(n3logn), descends to O(n3) of the new algorithm; Second, the new algorithm expands the searching area of pseudoknots. Experiment indicates that compared with dynamic weighted matching algorithm, RNA secondary structure prediction algorithm based on fast dynamic weighted matching can predict more possibly existing pseudoknots.
引文
[1] 刘海军, 史定华, 王翼飞. 日新月异的 RNA 二级结构预测. 自然杂志, 2003, 25(6): 314-322
    [2] Mirela A, Anthony P F, Frank H, et al. A New Algorithm for RNA Secondary Structure Design. J Mol Biol, 2004, 336(3): 607-624
    [3] Nikolaus S, Knut B. Mapping Property Distributions of Molecular Surfaces: Algorithm and Evaluation of a Novel 3D Quantitative Structure-Activity Relationship Technique. J Med Chem, 2003, 46(8): 1390-1407
    [4] Humberto G, Luis A T, Yaima G, et al. Markovian chemicals in silico design (MARCH-INSIDE), a promising approach for computer-aided molecular design III: 2.5D indices for the discovery of antibacterial. J Mol Model, 2005, 11(2): 116-123
    [5] Kolk M H, Van D G M, Wijmenga S S, et al. NMR structure of a classical pseudoknot: interplay of single- and double-stranded RNA. Science, 1998, 280(5362): 434-438
    [6] 刘振栋, 李恒武, 朱大铭. 计算最大堆迭的 RNA 二级结构预测算法. 南京大学学报(自然科学版), 2005, 41(5): 532-537
    [7] 高琼, 莫忠息, 郑卓. 一种基于能量的 RNA 二级结构预测的动态划分算法. 数学杂志, 2003, 23(1): 43-48
    [8] 谭光明, 冯圣中, 孙凝晖. RNA 二级结构预测中动态规划的优化和有效并行.Journal of Software, 2006, 17(7): 1501-1509
    [9] David H M, Douglas H T. Prediction of RNA secondary structure by free energy minimization. Current Opinion in Structural Biology, 2006, 16(3): 270-278
    [10] David H M. Revolutions in RNA Secondary Structure Prediction. J Mol Biol, 2006, 359(3): 526-532
    [11] 廖波, 王天明. RNA 二级结构的最小自由能算法. 生物数学学报, 2003, 18(3): 364-368
    [12] Jens R, Matthias H, Marc R, et al. Beyond Mfold: Recent advances in RNA bioinformatics. J Biotechnol, 2006, 124(1): 41-55
    [13] Kishore J D, Jamie J C, Christian W C, et al. 2004. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics, 2004, 5: 105
    [14] 任清华, 莫忠息, 陶玉敏. 预测 RNA 二级结构的一种遗传模拟退火算法[J]. 武汉大学学报(理学版), 2004, 50(1): 23-28
    [15] SONG DanDan, DENG ZhiDong. A fuzzy model of predicting RNA secondary structure. Science in China(Series F), 2007, 50(6): 846-866
    [16] David H M, Douglas H T. Dynalign: An Algorithm for Finding the Secondary Structure Common to Two RNA Sequences. J Mol Biol, 2002, 317(2): 191-203
    [17] Gregory D C, Shuyun L,Kaizhong Z. A new algorithm for computing similarity between RNA structures. Information Sciences, 2001, 139(1): 59-77
    [18] Xu X, Ji Y, Stormo G D. RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment. Bioinformatics, 2007, 23(15): 1883-1891
    [19] Xiaoyong Fang, Zhenghua Wang, Zhigang Luo, et al. The detection and assessment of possible RNA secondary structure using multiple sequence alignment. In: Proceedings of the 2007 ACM symposium on Applied computing. New York: ACM, 2007, 133-137
    [20] Serge D, Laurent T. RNA secondary structure comparison: exact analysis of the Zhang–Shasha tree edit algorithm. Theoretical Computer Science, 2003, 306(1-3): 471-484
    [21] Yasuo U, Aki H, Satoshi K, et al. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science, 1999, 210(2): 277-303
    [22] Ivo L H, Martin F, Peter F S. Secondary Structure Prediction for Aligned RNA Sequences. J Mol Biol, 2002, 319(5): 1059-1066
    [23] Veronica J, Charles W. RNA Secondary Structure Prediction Based on Free Energy and Phylogenetic Analysis. J Mol Biol, 1999, 289(4): 935-947
    [24] Jacek N, Ignacio T. RNA Structure and Stability. Seminars in VIROLOGY, 1997, 8(3): 153-165
    [25] Mirela A, Zhi C Z, Anne C. Secondary Structure Prediction of Interacting RNA Molecules. J Mol Biol, 2005, 345(5): 987-1001
    [26] Lyngs? R B, Pedersen C N S. RNA pseudoknot prediction in energy based models. J Comp Biol, 2001, 7(3-4): 409-427
    [27] Rivas E, Eddy E S R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol, 1999, 285(5): 2053-2068
    [28] 李恒武, 朱大铭, 纪秀花. RNA 二级结构预测算法的设计与实现. 计算机工程与科学, 2006, 28(7): 82-84, 94
    [29] Akutsu T. Dynamic programming algorithms for RNA secondary structureprediction with pseudoknots. Discrete Appl Math, 2000, 104(1-3): 45-62
    [30] Dirks R M, Pierce N A. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput Chem, 2003, 24 (13): 1664-1677
    [31] Lyngs? R B, Pedersen C N. Pseudoknots in RNA secondary structures. In: Computational Molecular Biology (RECOMB). New York: ACM Press, 2000, 201-209
    [32] Cary R B, Stormo G D. Graph-theoretic Approach to RNA Modeling Using Comparative Data. Proc Int Conf Intell Syst Mol Biol, 1995, 3: 75-80
    [33] Haijun L, Dong X, Yifei W. An RNA Folding Algorithm Including Pseudoknots Based on Dynamic Weighted Matching. Computational Biology and Chemistry, 2006, 30(1): 72–76
    [34] 王德宝, 祁国荣. 核酸:结构,功能与合成. 北京: 科学出版社, 1987, 1-254
    [35] 付微, 黄竞伟, 徐丽. RNA 二级结构表示方法及其转换算法. 计算机工程与应用, 2004, 40(14): 43-45 85
    [36] Dam T E, Pleij K, Draper D. Structural and Functional Aspects of RNA Pseudoknots. Biochemistry, 1992, 31(47): 11665-11676
    [37] Wang L. The practical Bioinformatician. Singapore: World Scientific Pub Co, 2004, 167-192
    [38] Pipas J M,McMahon J E. Method for predicting RNA secondary structure. PNAS USA, 1975, 72(6): 2017-2021
    [39] Fontana W,Stadler P E,et al. RNA folding and combinatory landscapes. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 1993, 47(3): 2083-2099
    [40] Hofacker I L,Schuster P. Combinatorics of RNA Secondary Structures. Discr Appl Math, 1998, 88(1-3): 207-237
    [41] Nebel M E. Combinatorial properties of RNA secondary structures. J Comput Biol, 2002, 9(3): 541-573
    [42] Macke T J, Ecker D J, et al. RNAMotif, an RNA Secondary Structure Definition and Search Algorithm. Nucl Acids Res, 2001, 29(22): 4724-4735
    [43] Bouthinon D, Soldano H. A New Method to Predict the Consensus Secondary Structures of a Set of Unaligned RNA Sequences. Bioinformatics, 1999, 15(10): 785-798
    [44] Gorodkin J, Sticklin S L, Stormo G D. Discovering common stem-loop motifs in unaligned RNA sequences. Nucleic Acids Res, 2001, 29(10): 2135-2144
    [45] Gauteret D, Major F, Cedergren R. Pattern searching alignment with RNAprimary and secondary structures: an effective descriptor for tRNA. Conput Appl Biocsi, 1990, 6(4): 325-331
    [46] Laferriere A, Gautheret D, Cedergren R. An RNA Pattern Matching Program with Enhanced Performance and portability. Comput Appl Bfocsi, 1994, 10(2): 211-212
    [47] Pesole G, Liuni S, D Souza M. Pat Search: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics, 2000, 16(5): 439-450
    [48] Van Bantenburg F H D, Gultyaev A P, et al. An APL-programmed Genetic Algorism for the Prediction of RNA Secodnary Struclure. J Theor Biol, 1995, 147(3): 269-280
    [49] Cultyaev A P, Van Batenburg F H D, et al. The Computer simulation of RNA Folding Pathways Using a Genetic Algorithm. J Mol Biol, 1995, 250(1): 37-51
    [50] Minoru K. Post-genome informatics. New York: Oxford University press, 2000, 1-146
    [51] Steeg E W. Neural networks: adaptive optimization and RNA secondary structure prediction. In: Artificial inte1ligecne and molecu1ar biology. Menlo Park: AAAI Press, 1995, 121-160
    [52] Ieong S, Kao M Y, et al. Predicting RNA secondary structures with arbitrary pseudoknots by maximizing the number of stacking pairs. J Comput Biol, 2003, 10(6): 195-198
    [53] Tabaska J E, Cary R B, et al. An RNA Folding Method Capable of Identifying Pseudoknots and Base Triples, Bioinformatics, 1998, 14(8): 691-699
    [54] Page R P M. Circles: Automating the Comparative Analysis of RNA Secondary Structure. Bioinformatics, 2000, 16(11): 1042-1043
    [55] Eddy S R. How Do RNA folding algorithms work. Nature Biotechnology, 2004, 22(11): 1457-1458
    [56] Page R D M. Comparative analysis of secondary structure of insect mitochondrial small subunit ribosomal RNA using maximum weighted matching. Nucl Acids Res, 2000, 28(20): 3839-3845
    [57] Durbin R, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge UK: Cambridge University Press, 1998, 260-298
    [58] Zuker M. Computer Prediction of RNA Structure. Method Enzymology, 1989, 180: 262-288
    [59] Ruan J, Stormo G D, Zhang W. An Iterated Loop Matching Approach to the Prediction of RNA Secondary Structures with Pseudoknots. Bioinformatics, 2004, 20(1): 58-66
    [60] Baldi P, Brunak S, et al. Assessing the Accuracy of Prediction Algorithms for Classification: an Overview. Bioinformatics 2000, 16(5): 412-424

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700