基于de Bruijin图的DNA多序列比对并行算法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于de Bruijin图的DNA多序列比对并行算法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Parallel Algorithm of Multiple DNA Sequence Alignment Based on de Bruijn Graph
作者：周红
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：多序列比对 ; de ; Bruijn图 ; 星型比对 ; 最大权值路径
英文关键词：Multiple Sequence Alignment ; de Bruijn graph ; star-Align ; maximum weighted path
学位年度：2010
导师：赵政
学科代码：081203
学位授予单位：天津大学
论文提交日期：2010-08-01

摘要

多序列比对是目前生物信息领域研究的重要课题之一,在基因识别、蛋白质结构预测等领域有着广泛的应用。由于问题本身所固有的复杂性,至今还没有一个令人满意的算法,同时随着生物数据的不断增长,串行算法已不能满足人们的需求。本课题重点研究了如何利用de Brujin图进行多序列比对及其并行化处理方案,提出了一个新的多序列比对并行算法PL_GAlign。课题的主要工作与贡献如下:
     在基于图论的算法中引入了距离参数并采用了改进的星形比对算法:详细分析了目前使用比较广泛的多序列比对算法,但是常用的并行划分策略对该类算法的执行效果较差。因此重点研究了基于图论的多序列比对算法并对其进行了改进:为了更好的适应基因的变异性,在该类算法中引入了距离参数d,将现有算法的精确匹配修改为允许一定误差的模糊匹配。在应用de Bruijn图得到中心序列后,摒弃了现有算法中常用的动态规划算法,采用了更为适合这种情况的星型比对算法并对其进行了改进,从而使该算法的时间复杂度降低至几乎线性。
     针对算法中的各个阶段提出了并行处理策略:针对多序列比对的高计算复杂性问题,研究了基于de Bruijn图的并行化处理方案。分别对基于图论的多序列比对算法中的构建de Bruijn图、去环、寻找最大权值路径和两两比对阶段的串行处理过程和可并行性进行了探讨,提出了各个阶段的并行处理策略。
     最后进行了一系列数据测试,实验结果证明PL_GAlign算法在运行速度上要优于现有的迭代法,尤其当输入序列较长且数目较多时,这种优势更为明显。在精度上略好于目前使用最广泛的CLUSTAL W算法。
Nowadays, Multiple Sequence Alignment is an important topic in Biology information industry, which has wide range of applications in area of Gene Identification, Structure Prediction, etc. However, satisfying algorithm for Multiple Sequence Alignment are still not available, due to the inherent complexity of the problem; meanwhile, with increasing quantity of biological data, Serial Algorithm is no longer able to meet the calculation demands. This research is focusing on how to apply the de Brujin graph in Multiple Sequence Alignment as well as parallel processing program. We also propose a new Multiple Sequence Alignment algorithm, namely PL_GAlign. The main work and contributions of the research are summarized as follows:
     Introduce the distance parameter d to algorithm of multiple sequence alignment based on de Bruijn graph and adopt the improved star-Align algorithm. Firstly, detailed analysis is conducted for the Multiple Sequence Alignment algorithms that are widely utilized in these days. However, the usual Parallel division strategy does not perform very well for these algorithms. Therefore, the key of the analysis is exploring and improving the Multiple Sequence Alignment algorithm based on graph, as well as make necessary refinements. To better take Genetic variability into account, we introduce the distance parameter d and replacing the current precise matches with vague match that allowed certain errors. After we obtained center series by applying de Bruijn graph, instead of applying dynamic programming algorithm, we adopt the improved star-Align algorithm; which are more suitable for this situation and can successfully reduce the complexity of the problem almost to linearity.
     Secondly, discuss the strategies for parallel processing in each stage. We study the parallel processing based on de Bruijn graph, in order to deal with high computational complexity of Multiple Sequence Alignment. We discuss the serial processing and possibility of parallel processing in the stage of building de Bruijn graph, removing circles、finding maximum weighted path and pair-wise alignment,and the strategies for parallel processing in each stage are also proposed in the research.
     Thirdly, we did some tests with a series of data and the experiment shows that the processing speeds of PL_GAlign algorithm are much higher then the current iterative algorithm, especially in the case with longer input sequence and more data. In terms of precision, PL_GAlign algorithm is slightly better than widely used CLUSTAL W algorithm.

引文

[1] Liebman M N., Molecular modeling of protein structure and function: a bioinformatic approach, J Comput Aided Mol Des, 1988, Vol.1, pp. 323-341
    [2] Persson, Bengt; Jolles, P.; Jornvall, H., Bioinformatics in protein analysis, Proteomics in functional genomics: Protein structure analysis, 2000, 215-231
    [3] Rhee SY, Bioinformatic resources, challenges, and opportunities, PLANT PHYSIOLOGY, 2000, Vol.124, pp. 1460-1464
    [4] Solis AD; Rackovsky S, Optimized representations and maximal information in proteins, PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2000, Vol.1, pp. 149-164
    [5] Bottomley S, Bioinformatics: guide for evaluating bioinformatic software, DRUG DISCOVERY TODAY, 1999, Vol.4, pp. 240-243
    [6] Ellsworth DL; Manolio TA, The emerging importance of genetics in epidemiologic research III. Bioinformatics and statistical genetic methods, ANNALS OF EPIDEMIOLOGY, 1999, Vol.9, pp. 207-224
    [7] Ji SC, Isomorphism between cell and human languages: molecular biological, bioinformatic and linguistic implications, BIOSYSTEMS, 1997, Vol.44, pp. 17-39
    [8]史忠植.知识发现.北京:清华大学出版社,2002
    [9]张春霞.生物信息学的现状与展望.世界科技研究与发展,2000,22(6):17-20
    [10]钟扬,王莉,张亮等译,生物信息学,高等教育出版社,2003
    [11] Chi E.H.-H.; Barry P.; Shoop E., et al, Visualization of biological sequence similarity search results, Proceedings. Visualization '95, 1995, pp. 44-51
    [12] Nielsen H; Engelbrecht J; vonHeijne G, et al., Defining a similarity threshold for a functional protein sequence pattern: The signal peptide cleavage site, PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1996, Vol.24, pp. 165-177
    [13] States D J; Gish W, Combined use of sequence similarity and codon bias for coding region identification, J Comput Biol, 1994, Vol.1, pp. 39-50
    [14] States D J; Agarwal P., Compact encoding strategies for DNA sequence similarity search, Proc Int Conf Intell Syst Mol Biol, 1996, Vol.4, pp. 211-217
    [15] Bucher P.; Hofmann K., A sequence similarity search algorithm based on a probabilistic interpretation of an alignment scoring system, Proc Int Conf Intell Syst Mol Biol, 1996, Vol.4, pp. 44-51
    [16] Wright W; Scordis P; Attwood TK, BLAST PRINTS - alternative perspectives on sequence similarity, BIOINFORMATICS, 1999, Vol.15, pp. 523-524
    [17] Claverie JM, Effective large-scale sequence similarity searches, COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, Vol.266, pp. 212-227
    [18] Mayoraz E; Dubchak I; Muchnik, I, Relation between protein structure, sequence homology and composition of amino acids, Proc Int Conf Intell Syst Mol Biol, 1995, Vol.3, pp. 240-248
    [19] Chung SY; Subbiah, S, A structural explanation for the twilight zone of protein sequence homology, STRUCTURE, 1996, Vol.4, pp. 1123-1127
    [20] Antonarakis SE, Mapping by sequence homology, EUROPEAN JOURNAL OF HUMAN GENETICS, 1996, Vol.4,pp. 247-249
    [21] Abraham DG; Cooper AJL, Cloning and expression of a rat kidney cytosolic glutamine transaminase K that has strong sequence homology to kynurenine pyruvate aminotransferase, ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS, 1996, Vol.335, pp. 311-320
    [22] Tramontano A., Homology modeling with low sequence identity, METHODS-A COMPANION TO METHODS IN ENZYMOLOGY, 1998,Vol.14, pp. 293-300
    [23] Growth of GenBank, http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
    [24]靳新,基于迭代策略的多序列比对算法研究,工学硕士学位论文,国防科学技术大学研究生院,2007
    [25] Robert Giegerichand David Wheeler, Pairwise Sequence Alignment. http://www.techfak.ui-bielefeldde/bcd/Curric/PruAli.html,1997
    [26] Smith T,Waterman M, Identification of common molecular sequence,Journal of Molecular Biology, 1981,Vol.147,pp.195-197
    [27] Hogeweg P,Hesper B, The alignment of sets of sequences and the construction of phylogenetic trees:an integrated method, J Mol Evol,1984,20(2):175-186。
    [28] Feng D F,Doolittle R F, Progressive sequence alignment as a prerequisite to correct phylogenetic trees,J Mol Evol,1987,25(4):351-360
    [29] Taylor W R, A flexible method to align lagre numbers of biological sequences,J Mol Evol,1988,28(122):161-169
    [30] Geory Fuellen,A gentle Guide to Multiple Sequence Alignrnent, http://www.techfak.ui-bielefeldde/bcd/Curric/mulAli.html,1997
    [31] Altschul S.F.,Gish W.,Miller W, Myers E.W., Lipman D.J, Basic local alignmnent search tool,J.Mol.Biol, 1990, 215:403-410
    [32] Sauder JM; Arthur, JW; Dunbrack, RL, Large-scale comparison of protein sequence alignment algorithms with structure alignments, PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2000, Vol.40, pp. 6-22
    [33] Zhang, HY, Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm, BIOINFORMATICS, 2003, Vol.19, pp. 1391-1396
    [34] Cameron M; Williams HE; Cannane A, A deterministic finite automaton for faster protein hit detection in BLAST, JOURNAL OF COMPUTATIONAL BIOLOGY,2006, Vol.13, pp. 965-978
    [35] Schaffer AA; Wolf YI; Ponting CP, et al., IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices, BIOINFORMATICS, 1999, Vol.15, pp. 1000-1011
    [36] Cameron M; Williams, HE; Cannane A, Improved gapped alignment in BLAST, IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2004, Vol.1, pp. 116-129
    [37] Kertesz-Farkas A; Dhir S; Sonego P, et al., Benchmarking protein classification algorithms via supervised cross-validation, JOURNAL OF BIOCHEMICAL AND BIOPHYSICAL METHODS, 2008, Vol.70, pp. 1215-1223
    [38] McGinnis S; Madden TL, BLAST: at the core of a powerful and diverse set of sequence analysis tools, NUCLEIC ACIDS RESEARCH, 2004, Vol.32, pp.20-25
    [39] Kann MG; Goldstein RA, Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison, PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2002, Vol.48, pp. 367-376
    [40] Garg D; Saxena SC; Bhardwaj LM, An efficient algorithm after ungapped analysis in BLAST, DNA SEQUENCE, 2006, Vol.17, pp. 247-253
    [41] Plewniak F; Thompson JD; Poch O, Ballast: Blast postprocessing based on locally conserved segments, BIOINFORMATICS, 2000, Vol.16, pp. 750-759
    [42]谭光明,冯高峰,徐琳,冯圣中,孙凝晖, ICT-BLAST算法优化与并行算法设计研究,第8届全国并行计算大会论文集, 2004
    [43] Xianyang Jiang; Xinchun Liu; Lin Xu, et al., A reconfigurable accelerator for Smith-Waterman algorithm, IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 2007, pp. 1077-1081
    [44] Zhou Cheng; Yu Song-Nian, Using cluster computers in bioinformatics research, Journal of Shanghai University, 2003, Vol.7, pp. 370-374
    [45]Li KB, Clustal W-MPI: ClustalW analysis using distributed and Parallel computing, Bioinformatics, 2003, Vol.19, pp.1585-1586
    [46] Chaichoompu K, Kittitornkun S, Sissades T, MT-ClustalW: Multithreading MultiPle Sequence Alignment, HiCOMB, 2006
    [47] Gibbs A.J,Mclntyre G, A method for Comparing Sequences-Its use with Amino Acid and Nucleotide Sequences,Eur.J.Biochem, 1970, Vol.16, pp.1-11
    [48] M ohamed Ibrahim Abouelhoda,Stefan Kurtz,Enno Ohlebusch,The Enhanced Suffix Array and its Applications to Genome Analysis, Proceedings of the Second Workshop on Algorithm in Bioinformatics,Lecture notes in Computer Science,2002, Vol.2452, pp. 449-463
    [49] Maizel J V,Gitch W M, Testing Eht Covarion Hypothesis of W volution Mol,Biol.Evol,1995, Vol.12, pp.:503-512
    [50] T.Jiang, M.Li, Approximating shortest superstrings with constrains, Theoretical Computer Science | Theoretical Computer Science, 1994, Vol.134, pp. 473-491
    [51] ALTSCHUL SF; GISH W; MILLER W, et al., Basic local Alignment Search tool, JOURNAL OF MOLECULAR BIOLOGY, 1990, Vol.215, pp.403-410
    [52] Ono A; Huang MJ; Freed EO, Characterization of human immunodeficiency virus type 1 matrix revertants: Effects on virus assembly, Gag processing, and env incorporation into virions, JOURNAL OF VIROLOGY, 1997, Vol.71, pp. 4409-4418
    [53] Zheng WM, Relation between weight matrix and substitution matrix: motif search by similarity, BIOINFORMATICS, 2005, Vol.21, pp. 938-943
    [54] Tyagi M; Gowri VS; Srinivasan N, et al., A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications, PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, Vol.65, pp. 32-39
    [55] Smith AD; Lui TWH; Tillier ERM, Empirical models for substitution in ribosomal RNA, MOLECULAR BIOLOGY AND EVOLUTION, 2004, Vol.21, pp. 419-427
    [56] Eyal E; Frenkel-Morgenstern M; Sobolev V, et al., A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction, PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, Vol.67, pp. 142-153
    [57] Paila U; Kondam R; Ranjan A, Genome bias influences amino acid choices: analysis of amino acid substitution and re-compilation of substitution matrices exclusive to an AT-biased genome, NUCLEIC ACIDS RESEARCH, 2008, Vol.36, pp. 6664-6675
    [58] Baussand J; Carbone A, Inconsistent Distances in Substitution Matrices can be Avoided by Properly Handling Hydrophobic Residues, EVOLUTIONARY BIOINFORMATICS, 2008, Vol.4, pp. 255-261
    [59] Arthur L, Delcher, Simon K, Alignment of whole genomes, Nucleic Acids Research, 1999, Vol.27, pp.2369~2376
    [60] Ma B, Tromp, M Li, PatternHunter: Faster and more Sensitive Homology Search, Bioinformatcis, 2002, Vol.18, pp.440~445
    [61] Dyer M, Frieze A, Suen S, The probability of unique solutions of sequencing by hybridization, Journal of Computional Biology,1994, Vol.1, pp. 105-110
    [62] Ela Hunt, Malcolm P, Atkinson, Robert W,Irving, A Database Index to Large Biological Sequences, Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp.139-148
    [63] TKAttwood,DJParyr Smtih著,罗静初等译,生物信息学概论,北京:北京大学出版社,2002
    [64] Notredame C, Desmond G Higgins, Jaap Heringa, T-Coffee:A Novel Method for Fast and Accurate Multiple Sequence Alignment, Journal of Molecular Biology,2000, Vol.302, pp.205-217
    [65] Wiens JJ, Testing phylogenetic methods with tree congruence: Phylogenetic analysis of polymorphic morphological characters in phrynosomatid lizards, SYSTEMATIC BIOLOGY, 1998, Vol.47, pp.427-444
    [66] Sedinova, J; Flegr, J; Ey, PL, et al. Use of random amplified polymorphic DNA (RAPD) analysis for the identification of Giardia intestinalis subtypes and phylogenetic tree construction, JOURNAL OF EUKARYOTIC MICROBIOLOGY, 2003, Vol.50, pp. 198-203
    [67] Grandcolas P; D'Haese C, The origin of a 'true' worker caste in termites: mapping the real world on the phylogenetic tree, JOURNAL OF EVOLUTIONARY BIOLOGY, 2004, Vol.17, pp. 461-463
    [68] Pecon-Slattery J; Wilkerson AJP; Murphy WJ, et al., Phylogenetic assessment of introns and SINEs within the Y chromosome using the cat family Felidae as a species tree, MOLECULAR BIOLOGY AND EVOLUTION, 2004, Vol.21, pp. 2299-2309
    [69] Hollich V; Milchert L; Arvestad L, et al., Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction, MOLECULAR BIOLOGY AND EVOLUTION, 2005, Vol.22, pp. 2257-2264
    [70] Letunic I; Bork P, Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation, BIOINFORMATICS, 2007, Vol.23, pp. 127-128
    [71] Matsen FA; Steel M, Phylogenetic mixtures on a single tree can mimic a tree of another topology, SYSTEMATIC BIOLOGY, 2007, Vol.56, pp. 767-775
    [72] Yarza P; Richter M; Peplies J, et al., The All-Species Living Tree project: A 16S rRNA-based phylogenetic tree of all sequenced type strains, SYSTEMATIC AND APPLIED MICROBIOLOGY, 2008, Vol.31, pp. 241-250
    [73] Yassin A; Araripe LO; Capy P, et al., Grafting the molecular phylogenetic tree with morphological branches to reconstruct the evolutionary history of the genus Zaprionus (Diptera: Drosophilidae), MOLECULAR PHYLOGENETICS AND EVOLUTION, 2008, Vol.47, 903-915
    [74] Needleman,Wunsch, A general Method Applicable to Search for Similarities in the Amino Acid Sequences of two proteins,J.Mol.Biol, 1970, Vol.48, pp.443-453
    [75] Simth T. F,Waterman M S, Identification of Common Molecular Subsequences, J.Mol,Biol,1981, Vol.147, pp.195-197
    [76]唐玉荣,生物信息学中一个优化的全局双序列比对算法,计算机应用,2004(24)
    [77] Delcher AL; Phillippy A; Carlton J, et al., Fast algorithms for large-scale genome alignment and comparison, NUCLEIC ACIDS RESEARCH, 2002, Vol.30, pp. 2478-2483
    [78]葛健,王国仁,于戈,后缀树的并行构造算法,计算机科学,2004,5(31)
    [79] Ma B, Chen X; Li M;, et al. DNACompress: fast and effective DNA sequence compression, BIOINFORMATICS,2002, Vol.18, pp. 1696-1698
    [80] Thompson J.D, Thierry J.C, Rascal P.O, Rrapid Scanning and Correction of Multiple Sequence alignments, Bioinformatics,2003,Vol.9, pp.1155-1161
    [81] Durbin R, Eddy S, Krogh A,生物序列分析,蛋白质和核酸的概率论模型,北京:清化大学出版社,2002。
    [82] K Biemann, H.A.Scoble, Characterization of tandem mass spectrometry of structural modifications in proteins, Science, 1987, Vol.237, pp.992-998
    [83] Robert C, Edgar, MUSCLE, Multiple Sequence Alignment with High Accuracy and high throughput, Nucleic Acids Research,2004, Vol.5, pp.1792~1797
    [84] Mohamed Ibrahim Abouelhoda, Stefan Kurt, Enno Ohlebusch, Replacing suffix Trees with enhanced suffix array, Journal of Discrete Algorithms, 2004, Vol.2, pp.56-83
    [85] Lee C,Grasso C, Sharlow M, Multiple Sequence alignment using partial Order graphs, Bioinformatics, 2002, Vol.18, pp.452-464
    [86] R Durbin,S Eddy, A Krogh, Sequencing of megabase plus DNA by hybridization:theory of the method, Genomics,1998, Vol.4, pp.114-128
    [87] D. J. Aldous, P.Diaconis, Hammersley’s interacting particle process and longest increasing subsequences, Probalility Theory and Related Fields, 2005, Vol.103, pp.199-213
    [88] Deogun J.S.; Fangrui Ma; Jingyi Yang, et al., A Prototype for Multiple Whole Genome Alignment, Proceedings of the 36 Hawaii international Conference on System Science, IEEE,2003
    [89] Barton G.J., Sternberg M.J.E., Evaluation and improvements in the automatic alignment of protein sequences, Protein Eng, 1987, Vol.1, pp.89-94
    [90] H. Carrillo, D. Lipman, The Multiple Sequence Alignment Problem in Biology, SIAM Journal on Applied Mathematics, 1988, Vol.48, pp.1073-1082
    [91] S.F. Altschul, D, J. Lipman, Trees, Stars and Multiple Biological Sequence Alignment, SIAM Journal on Applied Mathematics, 1989, Vol.49, pp.197-209
    [92]袁激光,金人超,基于A*算法的启发式算法求解多序列比对问题,武汉:华中科技大学学报,2003,31(9):50-52
    [93] S.K. Gupta, J.Kececioglu, A.Schaffer, Improving the Practical Space and Time Efficiency of the Shortest-paths Approach to Sum-of-pairs Multiple Sequence Alignment, Journal of Computational Biology, 1995, 2(3):459-472
    [94] M.K.Jonathan, A.Peter, B.Darryn, A Simulated Annealing Algorithm for Finding Consensus Sequences, Bioinformatics, 2002, 18(11):1494-1499
    [95] D.Higgins, A.Bleasby, R.Fuchs, CLUSTAL V: Using Clustal for Multiple Sequence Alignment, Comput, Appl.Biosci, 1992, 8:189-191
    [96] M.Garey, D.Johnson, Computers and intractability. A guide to the theory of NP-completeness, San Francisco: W.H.Freeman, 1979
    [97] Morgenstern B; Prohaska SJ; Pohler D, et al., Multiple Sequence alignment with user-defined constrains, ALGORITHMS FOR MOLECULAR BIOLOGY, 2006, Vol.1, pp.6
    [98] Thompson J, Higgins D, Gibson T, ClustalW: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting Position Specific Gap Penalties and Weight Matrix Choice, Nucleic Acids Research, 1994, Vol.22, pp. 4673-4680
    [99] Li Jing, Zhan Hong, Xue yi, A new nucleic acid sequence alignment algorithm and its application in Global sequence alignment, 2003, Vol.23, pp.265-271
    [100] Notredame C; Higgins DG; Heringa J., T-Coffee: A novel method for fast and accurate multiple sequence alignment, JOURNAL OF MOLECULAR BIOLOGY, 2000, Vol.302, pp. 205-217
    [101] Sjoreen A.L.; Athey G.F.; Sakenas C.A., et al., RASCAL-a screening model for estimating doses from radiological accidents, Proceedings of the ANS Topical Meeting on Emergency Response - Planning, Technologies, and Implementation (CONF-880913)|Proceedings of the ANS Topical Meeting on Emergency Response - Planning, Technologies, and Implementation (CONF-880913), 1988, pp.4-6
    [102] M.Charleston, M.Hendy, D.penny, Neighbor-joining uses optional weight for net divergence, Phylogenet.Evd, 1993, Vol.2, pp.6-12
    [103] ]J.A.Studier, K.J.Kepple, A note on the neighbor-joining algorithm of Saition and Nei, Molecular Biology and Evolution,1988,5(6):729-731
    [104]龚道雄,阮晓钢,基于遗传算法和模拟退火算法的DNA多序列比对算法研究,中国生物医学工程学报,2004,Vol.23:73-78
    [105]唐玉荣,生物信息学中的序列比对算法,计算机工程与应用,2003,Vol.29:4-7
    [106]欧阳曙光,贺福初,生物信息学:生物实验数据和计算技术结合的新领域,科学通报,1999,Vol.44(14):1457-1468
    [107] Notredame C; Higgins DG, SAGA: Sequence alignment by genetic algorithm, NUCLEIC ACIDS RESEARCH, 1996, Vol.24, pp. 1515-1524
    [108] Onda H; Fujino M, A novel natural ligand for orphan G-protein coupled receptor: Finding prolactin releasing peptide (PrRP), SEIKAGAKU, 1999, Vol.71, pp. 448-454
    [109] Y.Zhang, M.S.Waterman,An Eulerian Path Approach to Global Multiple Alignment for DNA Sequences, Journal of Computational Biology,2003,10(6):803~819
    [110] Lee C, Grasso C, Sharlow MF, Multiple sequence alignment using partial order graphs, Bioinformatics, 2002,18(3):452?464
    [111] Ye YZ, Godzik A, Multiple flexible structure alignment using partial order graphs, Bioinformatics, 2005,21(10):2362?2369。
    [112] Raphael B, Zhi D, Tang H, Pevzner P, A novel method for multiple alignment of sequences with repeated and shuffled elements,Genome Research, 2004,14(11):2336?2346。
    [113] Zhang Y,Waterman MS, An eulerian path approach to global multiple alignment for DNA sequences, Journal of Computational Biology, 2003,10(6):803?819。。
    [114] Waterman,M.S, Eggert,M, A New Algorithm for Best Subsequence alignment with application to tRNA-rRNA Comparisons, J.Mol.Biol,1987,Vol.197, pp.723-728.
    [115] D.Gusfield, Algorithms on Strings, Trees, and Sequences, Computer Science and Computation Biology.Cambridge University Press, 1997
    [116] Karlin S, Altschul S.F., Methods for assessing the statistical significance of molecular sequence features by using general scoringschemes, Proc.Natl.Acad.Sci, 1990, Vol.87, pp.2264-2268.
    [117]金戈, Linux高性能集群-硬件和网络体系结构,http://www-900.ibm.com/developerWorks /cn/linux/cluster/hpc/part3/
    [118]陈文光,武永卫,MPI与OpenMP并行程序设计,北京,清华大学出版社,2004
    [119]陈国良,安虹,郑启龙等,并行算法实践,北京,高等教育出版社,2004

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700