用户名: 密码: 验证码:
基因组中网络缺失基因和微型转座子的发现及研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着生物技术的发展,人们获得了大量的基因组数据并用以了解基因组的机制。越来越多的基因组被测序,分子生物学已进入了所谓的后基因组时代。现在我们能够直接探索基因组的一些全局特性,例如能够得到任意感兴趣基因在全基因组水平上的分布,并通过比较多种相关基因组来理解生物的各种机能等等。基因通过彼此间和代谢物间的物理和化学作用连接成复杂的网络关系,搞清楚完整的基因—蛋白质—新陈代谢网络的拓扑结构、局部组织形式和动态行为是系统生物学的最终目标。但是,现有的基因网络或者代谢途径(Pathway)还远远没有完善,许多构建的网络中存在着大量的缺失基因或者说“网络漏洞”,许多基因的功能和在网络中的具体位置还没有确定,生物网络中的许多节点和连接关系还有待于确定,这就是网络缺失基因问题。发现这些网络中的缺失基因是系统生物学中一个重要的挑战性工作。另外,具有翻转互补终端序列的微型转座子(简称微型转座子或者MITEs)是基因组中的一种重要的功能基因,它能够通过自身的位置转移、增加拷贝数等行为影响基因组的大小和基因的功能实现。在基因组水平中,发现所有的微型转座子及其分布能够让我们更加深入地了解基因组的功能实现以及进化历史。对于发现缺失基因问题和发现微型转座子问题,本文分别给出了有效的算法,能够快速准确地在全基因组水平下发现所有可能缺失基因和微型转座子。本文的主要的研究内容和创新点如下:
     ·主要研究内容
     对于发现缺失基因问题,我们利用operon信息,基因相似性信息以及phylogenetic profile信息构造了一个基因组参考图,并利用图论算法和设计了新的信息融合和基因排序方法,在整个基因组水平下发现缺失基因。为了进一步提高方法的精度和效率,我们又利用了regulon信息:首先我们给出了一个寻找motif的有效算法,并利用该算法计算了目标基因组中的所有可能的regulon,然后将该信息运用到基因组参考图中。实验结果表明,该方法非常有效,不但在目标网络中发现了大量的相关缺失基因,进一步揭示了基因网络的结构和功能特性,极大地提高了现有基因网络的精度,而且具有极强的鲁棒性。
     对于发现微型转座子问题,我们首次给出了一个在基因组范围内寻找和分析微型转座子的算法,并且实现了网络在线服务(MUST系统)http://csbll.bmb.uga.edu/ffzhou/MUST/。在许多原核生物基因组的应用试验中,我们的系统首次发现了大量的具有近期活性的微型转座子,并且该系统还首次发现了微型转座子与operon、微型转座子与相邻基因之间存在着影响关系,这些发现为揭示基因组的动态变化以及基因功能实现奠定了基础。
     第一章绪论,首先给出了本文中用到的生物学,图论以及计算复杂性理论的基础知识介绍。
     第二章首次给出了一个利用operon信息,基因相似性信息以及phylogeneticprofile信息来发现网络缺失基因的方法。通过比较基因组大小和进化关系,我们共选择了185个基因组来构造基因组参考图。该参考图以所有的基因为顶点,两个基因之间有边存在当且仅当它们在同一个operon中,或者它们是相似基因。对于目标pathway,我们利用其所有的已知基因作为参考基因,再寻找所有到参考基因关系最为紧密的基因来构成候选基因集合,并给出了分层的信息融合和排序的方法来确定最终的基因排序。我们测试了KEGG数据库中E.coli的所有121个pathway。结果显示如果目标pathway中已知基因的个数大于5个,该方法的正预测值(PPV)可以达到60%,并且随着基因个数的增加,可以达到90%,这一预测精度远远大于现有的相关算法,而且参数分析显示该方法具有高度的鲁棒性。同时许多预测的结果已经被近期更新的KEGG数据库的结果证实是正确的。实验结果还发现许多pathway可能在更高的结构层次上具有功能一致性,这进一步深化了pathway的结构和功能特性研究。
     第三章,给出了一个寻找motif的有效算法,该算法提出了序列邻集、概率打分矩阵等新的概念,并利用递归的计算策略来去除噪音。在许多实际的生物序列的测试中,该算法能够比相关的算法更有效的发现真正的motif。
     第四章,为了进一步提高发现缺失基因方法的精度,我们又引入了regulon信息。我们首先利用第三章中发现motif的算法描述了目标基因组中所有的regulon结构,然后将此信息融合到第二章中的基因组参考图中。试验显示,regulon信息进一步提高了寻找缺失基因算法的精度,对于所有基因个数大于20的pathway,平均的PPV率进一步提高了约2%。
     第五章首次给出了一个全基因组水平下发现所有可能微型转座子的算法,并且实现了在线服务(MUST系统)http://csbll.bmb.uga.edu/ffzhou/MUST/。该算法可以按照微型转座子的结构和序列相似性进行分类,同时输出许多相关统计和进化信息。应用MUST系统,我们成功地在Anabaena variabilis ATCC 29413验证了已经被深入研究的微型转座子族Nezha,同时还发现了新的具有近期活性的微型转座子。另外,我们首次在Haloquadratum walsbyi DSM 16790中发现了多个微型转座子族,这些微型转座子族均具有保守的终端结构和高度的序列相似性,并且发现了近期活性的进化痕迹。Haloquadratum walsbyi DSM 16790是一种极端耐盐的细菌。这是首次在此极端细菌中发现微型转座子,这些大量微型转座子族的存在进一步揭示了微型转座子可能参与到极其重要的基因组功能活动中,并且这种活动甚至在极端环境下生存的物种中依然比较活跃。
     第六章首次在Leptospira中发现了微型转座子Yuanxiao,该微型转座子大量存在于Leptospira的四个近缘物种中。Leptospira是一种病原体,可以导致一种称为钩端螺旋体病的人畜传染病。我们研究发现微型转座子Yuanxiao与转座基因ISLin1之间存在着进化关系,说明微型转座子可能是通过删除转座基因中的编码蛋白区域而进化得到的。研究表明该微型转座子还可能参与到相邻基因的调控过程中,这为研究微型转座子的生成,扩增以及转移机制提供了很好的研究基础。同时,也在基因层面上为解释该治病体的基因表达及功能变化提供了新的思路。
     第七章首次在Geobacter uraniireducens Rf4中发现了具有近期活性的微型转座子Chunjie,首次观察到Chunjie能够插入到operon结构中,并且没有破坏operon的结构。这进一步揭示了微型转座子的转移特性,并首次揭示了微型转座子对operon结构的进化影响关系。
     最后一章总结全文。
     ·本文的创新点
     创新点1.首次利用三种信息给出了在全基因组水平下发现网络缺失基因的方法,利用多基因组的比较在最大程度上发现目标网络中的缺失基因。方法具有高精度和高鲁棒性,极大地提高了目前相关方法的精度和结果。对于E.coli的121个目标pathway,该方法发现了大量的缺失基因,同时还在pathway内部和pathway之间发现了新的结构连接,为进一步研究基因网络的功能和结构特性奠定了基础。
     创新点1列于第二章中。
     创新点2.提出了一个新的寻找转录因子motif的有效算法,并用之计算了目标基因组中所有的regulon结构,并且将regulon结构信息融合到寻找网络缺失基因的方法中,进一步提高了寻找网络缺失基因方法的精度。
     创新点2列于第三、四章中。
     创新点3.首次给出了在全基因组水平下寻找和分析微型转座子的方法,并实现了在线服务(MUST系统)。对于给定的基因组,该系统能够发现所有可能的微型转座子并给出许多相关特性的分析。利用该系统,我们首次观察到Haloquadratum walsbyi DSM 16790中存在着大量的微型转座子族。这个现象首次揭示了微型转座子在极端微生物(耐盐)中依然活跃并且发挥着重要的作用。
     创新点3列于第五章中。
     创新点4.首次在Leptospira中发现了一个具有近期活性的微型转座子族Yuanxiao。并发现该微型转座子和某些特定的转座基因之间存在着结构和序列相似性,同时还可能在相邻基因的转录调控中发挥作用。Yuanxiao的发现为进一步研究微型转座子的产生、扩增、转移机制提供了样本,同时也为揭示Leptospira的致病机理提供新的思路。
     创新点4列于第六章中。
     创新点5.首次在Geobacter uraniireducens Rf4中发现了一个具有近期活性的微型转座子族Chunjie,并且发现Chunjie成功地插入到了一个operon结构中。这是首次发现微型转座子插入到operon结构中而没有破坏operon结构和功能,为研究原核生物基因组变化,特别是operon进化提供了直接的证据。
     创新点5列于第7章中。
With the development of the bio-technology,large amounts of genomic data now available for the understanding of genomic architectures.More and more genomes have been sequenced and molecular biology has entered the so-called post genomic era.We can now directly interrogate global properties like base frequencies and repetitive content, obtain the distribution of any interesting genes at the genome level and understand the biological pathway by comparing multi related genomes.Many pathways have been built to study the mechanisms of molecules.However,nowdays many pathways are incompelete and sometime with many errors.To complete existed pathways in genome level is a challenging problem in pathway study.Transposable elements are important genes in genome,which can affect the genome size and gene function.Find them out can give more insight of genome evolution.In this thesis,genome-wide discovery and analysis of missing pathway genes and miniature inverted-repeat transposable elements (MITEs) are presented.The main research contents and innovation points of this thesis are as follows:
     ·Main Research Contents
     To the missing pathway genes problem,we first introduce a powful method to find out missing genes in genome level by using operon information,similarity information and phylogenetic profile information,which highly improve the effectivity of the existed results.We further introduce an algorithm to find the motifs of genes to construct the regulon information and combine it to find missing genes.Our experiments show that we can get high effectivity and get more structure properties of pathways.
     To the finding problem of miniature inverted-repeat transposable elements(MITEs), we also present an algorithm to find out all possible MITEs in genome level.This method is very fast and effective,which can also give more analysis of found MITEs.We apply it to many prokaryotes and find many new functional and mapping properties of MITEs. Our studies also give more new properties of genome dynamic and gene function.
     Chapter 1 firstly provides a brief introduction of basic concepts of biology and graph theory and computational complexity theory,which will be used in the thesis.
     Chapter 2 presents a new method to identify missing genes in pathway and recruit new genes into pathway by using homology information,operon information and phylogenetic profile information for the first time.185 genomes are carefully selected based on their evolution relationship and genome size.Moreover,operons are predicted for all the selected genomes,and homologies are also calculated for any two genes(if they are homologs).Then a big graph named after "genome reference graph" is constructed, which takes the genes from 185 genomes as vertices,and there exist an edge between two vertices if and only if the corresponding two genes are in same operon or they are homologs,and the weight of edges are generated according to how the edges exist.For a specific pathway P in the target genome we assume that part of its genes(normally these genes are identified by orthologs based method) are known.So we start with the known genes,and calculate the shortest path between these genes and all other genes in the target genome.The genes which have shorter path from the known genes are predicted to have higher rank to be in the pathway P.The KEGG pathways for E.coli are used to validate our method.Our method has positive predictive value(PPV) 60%in top 10 candidates(out of 4131) when the gene number in reference pathway is equal or more than 5,and PPV can reach 90%when the genes number in pathway increases.Parameter analysis shows that our method is very robust,and some of our negative predictions are validated by the most recent release of KEGG.Further analysis shows many negative predictions are often in same other pathway,which reveals some new insights for how the pathway is defined.
     Chapter 3 gives a new algorithm to find motifs,which explores some new strategies. Firstly based on the concept of neighborhood set,a new probability matrix is defined, which can capture the target motifs effectively.Second,an iterative restart strategy is used,by which we can use several similar motifs' information to detect the real motif to demonstrate the effectiveness of our algorithm.We test it on several kinds of real biological sequence and compare its results with that of some other current presented algorithms.Simulation shows that the algorithm can effectively detect the subtle motifs.
     Chapter 4 combines the algorithm presented in chapter 3 and gene's motifs information to predict missing pathway genes.We have used the operon information to connect the genes in the same operon,but no connections between the different operons.However it is believed that the operons regulated by the same transcription factor(TF),which are named regulons in biology,are more function related.We further use the predicted motifs of each operon to find pathway missing genes.Based on the operon information we extract all promoters sets of the similarity genes(or operons),then we use the methods given in chapter3 to predict motifs of each promoters set.We define a new distance between the operons and mix distance results with primary results presented in chapter2. The experiment results show that predicted motif information is also useful and it can further improve average PPV rate.To all the pathways which genes number is more than 20,it can get average PPV rate(0.846) compared with(0.823) without this information in chapter2.
     Chapter 5 we first present a web-based tool(http://csbl1.bmb.uga.edu/ffzhou/MUST/) to uncover and analyze MITEs at the genome wide level.We can find all possible MITEs in a given sequence and classify them into different families due to the similarity of TIRs and IR.Furthermore,we can give automated analysis of MITEs,and output many related properties.We test this method on,4nabaena variabilis ATCC 29413 and successfully find the MITEs Nezha,which has been systematically studied.We also find there is another active MITEs family in it.Moreover,we apply our searching program to the genome of Haloquadratum walsbyi DSM 16790,which lives in extremely saltwater environments,and successfully find three possible MITEs families,Duanwu,Qixi, Chongyang.Further analysis shows that Duanwu has obvious recent transposition footprint left,and it could be an active MITEs family very recently.In each MITEs family,all the copies have conserve TIRs and DRs structures which show high similarity with each other.The conservation in different MITEs families and the high copy number suggest that there may be MITEs bursts in Haloquadratum walsbyi DSM 16790 recently. The MITE Uncovering SysTem(MUST) is fast and reliable to identify MITEs in a given genome.Its applications on the two bacterial genomes,Anabaena variabilis ATCC 29413 and Haloquadratum walsbyi DSM 16790,suggest there are many MITEs families exist in prokaryotic.Especially,the MITEs bursts phenomena found in Haloquadratum walsbyi DSM 16790 suggests that the occurrence and mobility of MITEs have very important cell function even in extremely environment living species.
     Chapter 6 we first identified a novel recently active MITEs,Yuanxiao,with 19-bp TIRs signals and 9-bp DRs in the four strains of Leptospira.Through the transposase encoded by ISLinl in the strain Lai ofLeptospira,Yuanxiao exerted transpositions in the common ancestor of all the four sequenced strains of Leptospira,and still retained very recent activities after the divergence of the strains Lai and Copenhageni of Leptospira.A very recent burst wave of transpositions of Yuanxiao was also observed in the four strains of Leptospira.Yuanxiao is the first recently active MITEs identified in Leptospira,and it plays a role in regulating the neighboring genes.
     Chapter 7 we first reported the recently active MITEs in Geobacter uraniireducens Rf4, Chunjie,and proposed that it might have been proliferated through the transposase which encoded by ISGur4 with very similar TIRs signals,since both of them were identified in Geobacter uraniireducens Rf4 and have almost identical copies with perfect DRs signals. The recent transposition of Chunjie was further confirmed by one insertion of Chunjie into an operon which was duplicated after the divergence of Geobacter uraniireducens Rf4 and its two completely sequenced close relatives,i.e.Geobacter metallireducens GS-15 and Geobacter sulfurreducens PCA.It is interesting to find that the structure of the operon does not seem to be disrupted by the insertion of Chunjie,compared with the other copy of the operon which was duplicated before the insertion.
     Chapter 8 concludes the whole thesis.
     ·Innovation Points of Thesis
     Innovation point 1.An effective method is presented to find missing pathway genes in genome level by combining three information sources for the first time.A genome reference graph is constructed by comparing 185 genomes and a graph algorithm is used to find the relation among genes.The method is very effective and rubost.It highly improves the pathway results and gives more connections and discoverings in aimed pathway and between pathways.
     Innovation point 1 can be found in Chapter 2.
     Innovation point 2.To further improve above method of finding missing pathway genes,we continue using the regulon information.We introduce a new motif finding algorithm and use it in finding regulons in genome level.By combining the regulon information,we give more detailed analysis of the method presented in chapter 2,and further improve the results of finding missing pathway genes.
     Innovation point 2 can be found in Chapter 3 and Chapter 4.
     Innovation point 3.We present a web-based tool(MUST) to uncover and analyze MITEs(http://csbl1.bmb.uga.edu/ffzhou/MUST/) at the genome wide level for the first time.Given a genome,this tool can find all possible MITEs and give further analysis automatically.It is the first time that we observe the surprising MITEs burst phenomena in Haloquadratum walsbyi DSM 16790 which suggests that MITEs is involved in important cell function even in extremely environment living species.
     Innovation point 3 can be found in Chapter 5.
     Innovation point 4.A novel recently active MITEs,Yuanxiao is detected in the four strains of Leptospira.Yuanxiao is the first recently active MITEs identified in Leptospira, and it plays a role in regulating the neighboring genes.
     Innovation point 4 can be found in Chapter 6.
     Innovation point 5.A novel recently active MITEs,Chunjie is detected in Geobacter uraniireducens Rf4,It is interesting to find that the structure of the operon does not seem to be disrupted by the insertion of Chunjie,compared with the other copy of the operon which was duplicated before the insertion.
     Innovation point 5 can be found in Chapter 7.
引文
Bennetzen,J.L.(2000) Transposable element contributions to plant gene and genome evolution,Plant Mol Biol,42,251-269.
    Biggs,N.(1993) algebraic graph thoery.Cambridge University Press,Cambridge,UK.
    Breitling,R.,Gilbert,D.,Heiner,M.and Orton,R.(2008) A structured approach for the engineering of biochemical network models,illustrated for signalling pathways,Brief Bioinform.
    Casacuberta,J.M.and Santiago,N.(2003) Plant LTR-retrotransposons and MITEs:control of transposition and impact on the evolution of plant genes and genomes,Gene,311,1-11.
    Chen,Y.,Zhou,F.,Li,G.and Xu,Y.(2008) A recently active MITE,Chunjie,inserted into an operon without disturbing the operon structure in Geobacter uraniireducens Rf4,Genetics.
    Christos H.Papadimitriou,K.S.(1998) Combinatorial Optimization:Algorithms and Complexity Dover Publications.
    De Las Rivas,J.and de Luis,A.(2004) Interactome data and databases:different types of protein interaction,Comp Funct Genomics,5,173-178.
    EMBL-EBI European Bioinformatics Institute.http://www.ebi.ac.uk/Databases.
    Friedman,N.(2004) Inferring cellular networks using probabilistic graphical models,Science,303,799-805.
    Green, M.L. and Karp, P.D. (2004) A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases, BMC Bioinformatics, 5, 76.
    
    Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S., Fisk, D.G., Hirschman, J.E., Hong, E.L., Nash, R.S., Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S.Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T. and White, R. (2004) The Gene Ontology (GO) database and informatics resource, Nucleic Acids Res, 32, D258-261.
    
    Huber, W., Carey, V.J., Long, L., Falcon, S. and Gentleman, R. (2007) Graphs in molecular biology, BMC Bioinformatics, 8 Suppl 6, S8.
    
    Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M. and Hirakawa, M. (2006) From genomics to chemical genomics: new developments in KEGG, Nucleic Acids Res, 34, D354-357.
    
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. and Hattori, M. (2004) The KEGG resource for deciphering the genome, Nucleic Acids Res, 32, D277-280.
    
    Karp, P.D., Ouzounis, C.A., Moore-Kochlacs, C., Goldovsky, L., Kaipa, P., Ahren, D., Tsoka, S., Darzentas, N., Kunin, V. and Lopez-Bigas, N. (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes, Nucleic Acids Res, 33, 6083-6089.
    Kharchenko, P., Chen, L., Freund, Y., Vitkup, D. and Church, G.M. (2006) Identifying metabolic enzymes with multiple types of association evidence, BMC Bioinformatics, 7, 177.
    Kim, J.M., Vanguri, S., Boeke, J.D., Gabriel, A. and Voytas, D.F. (1998) Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence, Genome Res, 8,464-478.
    Loot, C., Santiago, N., Sanz, A. and Casacuberta, J.M. (2006) The proteins encoded by the pogo-like Lemil element bind the TIRs and subterminal repeated motifs of the Arabidopsis Emigrant MITE: consequences for the transposition mechanism of MITEs, Nucleic Acids Res, 34, 5238-5246.
    Markowetz, F. and Spang, R. (2007) Inferring cellular networks--a review, BMC Bioinformatics, 8 Suppl 6, S5.
    NCBI National Center for Biotecnology http://www.ncbi.nlm.nih.gov.
    Palomeque, T., Antonio Carrillo, J., Munoz-Lopez, M. and Lorite, P. (2006) Detection of a mariner-like element and a miniature inverted-repeat transposable element (MITE) associated with the heterochromatin from ants of the genus Messor and their possible involvement for satellite DNA evolution, Gene, 371, 194- 205.
    Papadimitriou, C.H. (1993) computational complexity. In. Addison Wesley, Reading, MA, USA. Peregrin-Alvarez, J.M. (2008) Inferring ancestral protein interaction networks, Methods Mol Biol, 452, 417-430.
    
    Schaeffer, S.E. (2007) graph clustering, Computer Science Review, 1,27-64.
    Thomas H. Cormen, C.E.L., Ronald L. Rivest, Clifford Stein (2001) Introduction to Algorithms The MIT Press.
    Thornburg, B.G., Gotea, V. and Makalowski, W. (2006) Transposable elements as a significant source of transcription regulating signals, Gene, 365, 104-110.
    Touchon, M. and Rocha, E.P. (2007) Causes of insertion sequences abundance in prokaryotic genomes, Mol Biol Evol, 24, 969-981.
    Vastrik, I., D'Eustachio, P., Schmidt, E., Joshi-Tope, G., Gopinath, G., Croft, D., de Bono, B., Gillespie, M., Jassal, B., Lewis, S., Matthews, L., Wu, G., Birney, E. and Stein, L. (2007) Reactome: a knowledge base of biologic pathways and processes, Genome Biol, 8, R39.
    Vazirani, V.V. (2004) Approximation Algorithms Springer.
    Verkhedkar, K.D., Raman, K., Chandra, N.R. and Vishveshwara, S. (2007) Metabolome based reaction graphs of M. tuberculosis and M. leprae: a comparative network analysis, PLoS ONE, 2, e881.
    Aravind,L.(2000) Guilt by association:contextual information in genome analysis,Genome Res,10,1074-1077.
    Bebek,G.and Yang,J.(2007) PathFinder:mining signal transduction pathway segments from proteinprotein interaction networks,BMC Bioinformatics,8,335.
    Brouwer,R.W.,Kuipers,O.P.and Hijum,S.A.(2008) The relative value of operon predictions,Brief Bioinform.
    Cakmak,A.and Ozsoyoglu,G.(2007) Mining biological networks for unknown pathways,Bioinformatics,23,2775-2783.
    Choi,K.and Kim,S.(2008) ComPath:comparative enzyme analysis and annotation in pathway/subsystem contexts,BMC Bioinformatics,9,145.
    Clauset,A.,Moore,C.and Newman,M.E.(2008) Hierarchical structure and the prediction of missing links in networks,Nature,453,98-101.
    Cokus,S.,Mizutani,S.and Pellegrini,M.(2007) An improved method for identifying functionally linked proteins using phylogenetic profiles,BMC Bioinformatics,8 Suppl 4,S7.
    Cordwell,S.J.(1999) Microbial genomes and "missing" enzymes:redefining biochemical pathways,Arch Microbiol,172,269-279.
    Dam,P.,Olman,V.,Harris,K.,Su,Z.and Xu,Y.(2007) Operon prediction using both genome-specific and general genomic information,Nucleic Acids Res,35,288-298.
    DeJongh,M.,Formsma,K.,Boillot,P.,Gould,J.,Rycenga,M.and Best,A.(2007) Toward the automated generation of genome-scale metabolic networks in the SEED,BMC Bioinformatics,8,139.
    Enright,A.J.,Iliopouios,I.,Kyrpides,N.C.and Ouzounis,C.A.(1999) Protein interaction maps for complete genomes based on gene fusion events,Nature,402,86-90.
    Eppstein,D.(1998) find the k shortest paths,SIAM J.Computing,211,652-673.
    Feist,A.M.,Henry,C.S.,Reed,J.L.,Krummenacker,M.,Joyce,A.R.,Karp,P.D.,Broadbelt,L.J.,Hatzimanikatis,V.and Palsson,B.O.(2007) A genome-scale metabolic reconstruction for Escherichia coil K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information,Mol Syst Biol,3,121.
    Feist,A.M.and Palsson,B.O.(2008) The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli,Nat Biotechnol,26,659-667.
    Francesconi,M.,Remondini,D.,Neretti,N.,Sedivy,J.M.,Cooper,L.N.,Verondini,E.,Milanesi,L.and Castellani,G.(2008) Reconstructing networks of pathways via significance analysis of their intersections,BMC Bioinformatics,9 Suppl 4,S9.
    Gonzalez,O.and Zimmer,R.(2008) Assigning functional linkages to proteins using phylogenetic profiles and continuous phenotypes,Bioinformatics,24,1257-1263.
    Green, M.L. and Karp, P.D. (2004) A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases, BMC Bioinformatics, 5, 76.
    Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S., Fisk, D.G., Hirschman, J.E., Hong, E.L., Nash, R.S., Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S.Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T. and White, R. (2004) The Gene Ontology (GO) database and informatics resource, Nucleic Acids Res, 32, D258-261.
    
    Hasona, A., Kim, Y., Healy, F.G., Ingram, L.O. and Shanmugam, K.T. (2004) Pyruvate formate lyase and acetate kinase are essential for anaerobic growth of Escherichia coli on xylose, J Bacteriol, 186, 7593-7600.
    
    Huang, Y., Li, H., Hu, H., Yan, X., Waterman, M.S., Huang, H. and Zhou, X.J. (2007) Systematic discovery of functional modules and context-specific functional annotation of human genome, Bioinformatics, 23, i222-229.
    Itoh, T., Takemoto, K., Mori, H. and Gojobori, T. (1999) Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes, Mol Biol Evol, 16, 332-346.
    Jothi, R., Przytycka, T.M. and Aravind, L. (2007) Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment, BMC Bioinformatics, 8,173.
    Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M. and Hirakawa, M. (2006) From genomics to chemical genomics: new developments in KEGG, Nucleic Acids Res, 34, D354-357.
    Kelley, B.P., Sharan, R., Karp, R.M., Sittler, T., Root, D.E., Stockwell, B.R. and Ideker, T. (2003) Conserved pathways within bacteria and yeast as revealed by global protein network alignment, Proc Natl AcadSci USA, 100,11394-11399.
    Kensche, P.R., van Noort, V., Dutilh, B.E. and Huynen, M.A. (2008) Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution, J R Soc Interface, 5, 151-170.
    Kharchenko, P., Chen, L., Freund, Y., Vitkup, D. and Church, G.M. (2006) Identifying metabolic enzymes with multiple types of association evidence, BMC Bioinformatics, 7, 177.
    Kim, Y., Ingram, L.O. and Shanmugam, K.T. (2007) Construction of an Escherichia coli K-12 mutant for homoethanologenic fermentation of glucose or xylose without foreign genes, Appl Environ Microbiol, 73, 1766-1771.
    Kim, Y., Ingram, L.O. and Shanmugam, K.T. (2008) Dihydrolipoamide dehydrogenase mutation alters the NADH sensitivity of pyruvate dehydrogenase complex of Escherichia coli K-12, J Bacteriol, 190, 3851- 3858.
    Kolesov, G., Mewes, H.W. and Frishman, D. (2001) SNAPping up functionally related genes based on context information: a colinearity-free approach, J Mol Biol, 311, 639-656.
    Korbel, J.O., Jensen, L.J., von Mering, C. and Bork, P. (2004) Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs, Nat Biotechnol, 22, 911-917.
    Loganantharaj, R. and Atwi, M. (2007) Towards validating the hypothesis of phylogenetic profiling, BMC Bioinformatics, 8 Suppl 7, S25.
    Maeda, T., Sanchez-Torres, V. and Wood, T.K. (2007) Enhanced hydrogen production from glucose by metabolically engineered Escherichia coli, Appl Microbiol Biotechnol, 77, 879-890.
    Markowetz, F. and Spang, R. (2007) Inferring cellular networks--a review, BMC Bioinformatics, 8 Suppl 6, S5.
    Mushegian, A.R. and Koonin, E.V. (1996) Gene order is not conserved in bacterial evolution, Trends Genet, 12,289-290.
    Osterman, A. and Overbeek, R. (2003) Missing genes in metabolic pathways: a comparative genomics approach, Curr Opin Chem Biol, 7, 238-251.
    Pandey, J., Koyuturk, M., Kim, Y., Szpankowski, W., Subramaniam, S. and Grama, A. (2007) Functional annotation of regulatory pathways, Bioinformatics, 23, i377-386.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. and Yeates, T.O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles, Proc Natl Acad Sci U S A, 96, 4285-4288.
    Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N. and Barabasi, A.L. (2002) Hierarchical organization of modularity in metabolic networks, Science, 297, 1551-1555.
    Riley, M., Abe, T., Arnaud, M.B., Berlyn, M.K., Blattner, F.R., Chaudhuri, R.R., Glasner, J.D., Horiuchi, T., Keseler, I.M., Kosuge, T., Mori, H., Perna, N.T., Plunkett, G., 3rd, Rudd, K.E., Serres, M.H., Thomas, G.H., Thomson, N.R., Wishart, D. and Wanner, B.L. (2006) Escherichia coli K-12: a cooperatively developed annotation snapshot--2005, Nucleic Acids Res, 34, 1-9.
    
    Roth, C, Rastogi, S., Arvestad, L., Dittmar, K., Light, S., Ekman, D. and Liberles, D.A. (2007) Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms, J Exp Zoolog B Mol Dev Evol, 308, 58-73.
    
    Salgado, H., Gama-Castro, S., Peralta-Gil, M., Diaz-Peredo, E., Sanchez-Solano, F., Santos-Zavaleta, A., Martinez-Flores, I., Jimenez-Jacinto, V., Bonavides-Martinez, C, Segura-Salazar, J., Martinez-Antonio, A. and Collado-Vides, J. (2006) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions, Nucleic Acids Res, 34, D394-397.
    
    Sanguinetti, G., Noirel, J. and Wright, P.C. (2008) MMG: a probabilistic tool to identify submodules of metabolic pathways, Bioinformatics, 24, 1078-1084.
    Satish Kumar, V., Dasika, M.S. and Maranas, C.D. (2007) Optimization based automated curation of metabolic reconstructions, BMC Bioinformatics, 8, 212.
    Shiga, M., Takigawa, I. and Mamitsuka, H. (2007) Annotating gene function by combining expression data with a modular gene network, Bioinformatics, 23, i468-478.
    Snitkin, E.S., Gustafson, A.M., Mellor, J., Wu, J. and DeLisi, C. (2006) Comparative assessment of performance and genome dependence among phylogenetic profiling methods, BMC Bioinformatics, 7, 420.
    Spirin, V., Gelfand, M.S., Mironov, A.A. and Mirny, L.A. (2006) A metabolic network in the evolutionary context: multiscale structure and modularity, Proc Natl Acad Sci U S A, 103, 8774-8779.
    Sun, J., Xu, J., Liu, Z., Liu, Q., Zhao, A., Shi, T. and Li, Y. (2005) Refined phylogenetic profiles method for predicting protein-protein interactions, Bioinformatics, 21, 3409-3415.
    
    Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J. and Natale, D.A. (2003) The COG database: an updated version includes eukaryotes, BMC Bioinformatics, 4,41.
    To, C.C. and Vohradsky, J. (2008) Supervised inference of gene-regulatory networks, BMC Bioinformatics, 9,2.
    Ulitsky, I. and Shamir, R. (2007) Identification of functional modules using network topology and high-throughput data, BMC Syst Biol, 1,8.
    Werhli, A.V., Grzegorczyk, M. and Husmeier, D. (2006) Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks, Bioinformatics, 22, 2523-2531.
    Wierling, C., Herwig, R. and Lehrach, H. (2007) Resources, standards and tools for systems biology, Brief Funct Genomic Proteomic, 6,240-251.
    Wu, H., Su, Z., Mao, F., Olman, V. and Xu, Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application, Nucleic Acids Res, 33,2822-2837.
    Wu, J., Hu, Z. and DeLisi, C. (2006) Gene annotation and network inference by phylogenetic profiling, BMC Bioinformatics, 7, 80.
    Yamada, T., Kanehisa, M. and Goto, S. (2006) Extraction of phylogenetic network modules from the metabolic network, BMC Bioinformatics, 7,130.
    Yamanishi, Y., Vert, J.P. and Kanehisa, M. (2004) Protein network inference from multiple genomic data: a supervised approach, Bioinformatics, 20 Suppl 1, i363-370.
    Yan, X., Mehan, M.R., Huang, Y., Waterman, M.S., Yu, P.S. and Zhou, X.J. (2007) A graph-based approach to systematically reconstruct human transcriptional regulatory modules, Bioinformatics, 23, i577-586.
    Yanai, I. and DeLisi, C. (2002) The society of genes: networks of functional links between genes from comparative genomics, Genome Biol, 3, research0064.
    Zhao, X.M., Wang, R.S., Chen, L. and Aihara, K. (2008) Uncovering signal transduction networks from high-throughput data by integer linear programming, Nucleic Acids Res, 36, e48.
    Bailey T. and Elkan C. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization, Machine Learning, 21,51-80.
    Brazma A. and Jonassen I. (1998) Predicting gene regulatory elements in silico on a genomic scale, Genome Research, 8, 1202-1215.
    Brazma A., Jonassen I., Eidhammer I. and Gilbert D. (1998) Approaches to the automatic discovery of patterns in biosequences, Journal of Computational Biology, 5, 279-305.
    Ettwiller L. M. and Rung J. (2003) Discovering novel cis-regulatory motifs using functional networks, Genome Research, 13(5), 883-895.
    Fraenkel Y., Mandel Y., Friedberg D. and Margalit H. (1995) Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon, Comp. Appl. Biosci., 11, 379-387.
    Frances M. and Litman A. (1997) On covering problems of codes, Theoretical Computing System, 30, 113-119.
    Gelfand M., Koonin E. and Mironov A. (2000) Prediction of transcription regulatory sites in Archaea by a comparative genomic approach, Nucleic Acids Res., 28, 695-705.
    Helden J. van, Andre B. and Vides J. Collado (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies, J. Mol. Biol, 281, 827-842.
    Helden J. V. and Rios A. F. (2000) Discovering regulatory elements in non-coding sequences by analysis of spaced dyads, Nucleic Acids Research, 28, 1808-1818.
    
    Hu Yuh-Jyh (2003) Finding Subtle Motifs with Variable Gaps in Unaligned DNA sequences, Computer Methods and Program in Biomedicine, 70, 11-20.
    Hughes J. D., Estep P. W., Tavazoie S., Church G. M. (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae, Journal of Molecular Biology, 296(5), 1205-1214.
    Keich, U., Pevzner, P. A. (2002) Finding motifs in the twilight zone. Bioinformatics, 18,1374-1381.
    Li Guojun, Lu Jizhu, Olman Victor and Xu Ying (2005) PROMOCO: a New Program for Prediction of cis Regulatory Elements: From High-Information Content Analysis to Clique Identification, Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops (CSBW'05).
    Li M., Ma B. and Wang L. (1999) Finding similar regions in many strings, Proceedings of the 31st ACM Annual Symposium on Theory of Computing, 473—482.
    
    Li M., Ma, B. and Wang L. S. (2002) On the closest string and substring problems, Journal of the ACM, 49(2), 157-171.
    
    Liu X, Brutlag D L and Liu J S (2001) BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomputing, 127-138.
    Pesole G., Prunella N., Liuni S., Attimonelli M. and Saccone C. (1992) WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences, Nucleic Acids Res., 20, 2871-2875.
    Pevzner P.and Sze S.(2000) Combinatorial approaches to finding subtle signals in DNA sequences,Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology,269-278.
    Quandt K.,MatInd K.Frech.and Mat(1995) Inspector:new fast and versatile tools for detection of consensus matches in nucleotide sequence data,Nucleic Acids Research,23,878-884.
    Rigoutsos I.and Floratos A.(1998) Combinatorial pattern discovery in biological sequences,Bioinformatics,14,55-67.
    Sinha S.and Tompa M.(2002) Discovery of novel transcription factor binding sites by statistical overrepresentation,Nucleic Acids Research,30(24),5549.
    Staden R.(1989) Methods for discovering novel motifs in nucleic acid sequences,Computer Applications in Biosciences,5,293-298.
    Tompa M.(1999) An exact method for finding short motifs in sequences with application to the Ribosome Binding Site problem,Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology,262-271.
    Wolfertstetter F.,Frech K.,Herrmann G.and Werner T.(1996) Identification of functional elements in unaligned nucleic acid sequences,Computer Applications in Biosciences,12,71-80.
    Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., Wheeler, R., Wong, B., Drenkow, J., Yamanaka, M., Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl, K. and Gingeras, T.R. (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs, Cell, 116,499-509.
    
    Erill, I., Escribano, M., Campoy, S. and Barbe, J. (2003) In silico analysis reveals substantial variability in the gene contents of the gamma proteobacteria LexA-regulon, Bioinformatics, 19,2225-2236.
    
    Gama-Castro, S., Jimenez-Jacinto, V., Peralta-Gil, M., Santos-Zavaleta, A., Penaloza-Spinola, M.I., Contreras-Moreira, B., Segura-Salazar, J., Muniz-Rascado, L., Martinez-Flores, I., Salgado, H., Bonavides- Martinez, C., Abreu-Goodger, C., Rodriguez-Penagos, C., Miranda-Rios, J., Morett, E., Merino, E., Huerta, A.M., Trevino-Quintanilla, L. and Collado-Vides, J. (2008) RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation, Nucleic Acids Res, 36, D120-124.
    
    Gelfand, M.S., Novichkov, P.S., Novichkova, E.S. and Mironov, A.A. (2000) Comparative analysis of regulatory patterns in bacterial genomes, Brief Bioinform, 1,357-371.
    
    Impey, S., McCorkle, S.R., Cha-Molstad, H., Dwyer, J.M., Yochum, G.S., Boss, J.M., McWeeney, S., Dunn, J.J., Mandel, G. and Goodman, R.H. (2004) Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions, Cell, 119, 1041-1054.
    
    Klein, J., Leupold, S., Munch, R., Pommerenke, C., Johl, T., Karst, U., Jansch, L., Jahn, D. and Retter, I. (2008) ProdoNet: identification and visualization of prokaryotic gene regulatory and metabolic networks, Nucleic Acids Res, 36, W460-464.
    
    Kremling, A., Jahreis, K., Lengeler, J.W. and Gillcs, E.D. (2000) The organization of metabolic reaction networks: a signal-oriented approach to cellular models, Metab Eng, 2,190-200.
    
    Li, G.L.B.C., Dongsheng; Sun, Johathan; Xu, Ying (to appear), Bioinformatics.
    
    Makarova, K.S., Mironov, A.A. and Gelfand, M.S. (2001) Conservation of the binding site for the arginine repressor in all bacterial lineages, Genome Biol, 2, RESEARCH0013.
    
    Munch, R., Hiller, K., Grote, A., Scheer, M., Klein, J., Schobert, M. and Jahn, D. (2005) Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes, Bioinformatics, 21,4187-4189.
    
    Pachkov, M., Erb, I., Molina, N. and van Nimwegen, E. (2007) SwissRegulon: a database of genome-wide annotations of regulatory sites, Nucleic Acids Res, 35, D127-131.
    
    Rajewsky, N., Socci, N.D., Zapotocky, M. and Siggia, E.D. (2002) The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons, Genome Res, 12, 298-308.
    
    Rodionov, D.A., Mironov, A.A. and Gelfand, M.S. (2001) Transcriptional regulation of pentose utilisation systems in the Bacillus/Clostridium group of bacteria, FEMS Microbiol Lett, 205, 305-314.
    Smith, T.F., Waterman, M.S. and Fitch, W.M. (1981) Comparative biosequence metrics, J Mol Evol, 18, 38-46.
    
    Su, Z., Olman, V., Mao, F. and Xu, Y. (2005) Comparative genomics analysis of NtcA regulons in cyanobacteria: regulation of nitrogen assimilation and its coupling to photosynthesis, Nucleic Acids Res, 33, 5156-5171.
    
    Su, Z., Olman, V. and Xu, Y. (2007) Computational prediction of Pho regulons in cyanobacteria, BMC Genomics, 8, 156.
    Benlloch,S.,Acinas,S.G.,Anton,J.,Lopez-Lopez,A.,Luz,S.P.and Rodriguez-Valera,F.(2001)Archaeal Biodiversity in Crystallizer Ponds from a Solar Saltern:Culture versus PCR,Microb Ecol,41,12-19.
    Bolhuis,H.,Palm,P.,Wende,A.,Falb,M.,Rampp,M.,Rodriguez-Valera,F.,Pfeiffer,F.and Oesterhelt,D.(2006) The genome of the square archaeon Haloquadratum walsbyi:life at the limits of water activity,BMC Genomics,7,169.
    Bureau,T.E.and Wessler,S.R.(1994) Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses,Proc Natl Acad Sci U S A,91,1411-1415.
    Burns,D.G.,Janssen,P.H.,Itoh,T.,Kamekura,M.,Li,Z.,Jensen,G.,Rodriguez-Valera,F.,Bolhuis,H.and Dyall-Smith,M.L.(2007) Haloquadratum waisbyi gen.nov.,sp.nov.,the square haloarchaeon of
    Walsby,isolated from saltern crystallizers in Australia and Spain,Int J Syst Evol Microbiol,57,387-392.
    Casacuberta,J.M.and Santiago,N.(2003) Plant LTR-retrotransposons and MITEs:control of transposition and impact on the evolution of plant genes and genomes,Gene,311,1-11.
    Dongen,S.V.(2008) Graph Clustering Via a Discrete Uncoupling Process,SIAM Journal on Matrix Analysis and Applications,30,121-141.
    Dufresne, M., Hua-Van, A., El Wahab, H.A., Ben M'Barek, S., Vasnier, C., Teysset, L., Kema, G.H. and Daboussi, M.J. (2007) Transposition of a fungal miniature inverted-repeat transposable element through the action of a Tcl-like transposase, Genetics, 175,441-452.
    
    Falb, M., Pfeiffer, F., Palm, P., Rodewald, K., Hickmann, V., Tittor, J. and Oesterhelt, D. (2005) Living with two extremes: conclusions from the genome sequence of Natronomonas pharaonis, Genome Res, 15, 1336-1343.
    Gruber, A.R., Neubock, R., Hofacker, I.L. and Washietl, S. (2007) The RNAz web server: prediction of thermodynamicaHy stable and evolutionary conserved RNA structures, Nucleic Acids Res, 35, W335-338.
    He, S., Liu, C., Skogerbo, G., Zhao, H., Wang, J., Liu, T., Bai, B., Zhao, Y. and Chen, R. (2008) NONCODE v2.0: decoding the non-coding, Nucleic Acids Res, 36, D170-172.
    
    Jiang, N., Bao, Z., Zhang, X., Hirochika, H., Eddy, S.R., McCouch, S.R. and Wessler, S.R. (2003) An active DNA transposon family in rice, Nature, 421,163-167.
    Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O. and Walichiewicz, J. (2005) Repbase Update, a database of eukaryotic repetitive elements, Cytogenet Genome Res, 110,462-467.
    Kikuchi, K., Terauchi, K., Wada, M. and Hirano, H.Y. (2003) The plant MITE mPing is mobilized in anther culture, Nature, 421, 167-170.
    Krishnan, A. and Tang, F. (2004) Exhaustive whole-genome tandem repeats search, Bioinformatics, 20, 2702-2710.
    Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J. and Giegerich, R. (2001) REPuter: the manifold applications of repeat analysis on a genomic scale, Nucleic Acids Res, 29, 4633-4642.
    Loot, C., Santiago, N., Sanz, A. and Casacuberta, J.M. (2006) The proteins encoded by the pogo-like Lemil element bind the TIRs and subterminal repeated motifs of the Arabidopsis Emigrant MITE: consequences for the transposition mechanism of MITEs, Nucleic Acids Res, 34, 5238-5246.
    Mason-Gamer, R.J. (2007) Multiple homoplasious insertions and deletions of a Triticeae (Poaceae) DNA transposon: a phylogenetic perspective, BMC Evol Biol, 7, 92.
    Mathews, D.H., Sabina, J., Zuker, M. and Turner, D.H. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure, J Mol Biol, 288, 911-940.
    Mazzone, M., De Gregorio, E., Lavitola, A., Pagliarulo, C., Alifano, P. and Di Nocera, P.P. (2001) Whole-genome organization and functional properties of miniature DNA insertion sequences conserved in pathogenic Neisseriae, Gene, 278,211-222.
    
    Menzel, G., Dechyeva, D., Keller, H., Lange, C., Himmelbauer, H. and Schmidt, T. (2006) Mobilization and evolutionary history of miniature inverted-repeat transposable elements (MITEs) in Beta vulgaris L, Chromosome Res, 14, 831-844.
    Nakazaki, T., Okumoto, Y., Horibata, A., Yamahira, S., Teraishi, M., Nishida, H., Inoue, H. and Tanisaka, T. (2003) Mobilization of a transposon in the rice genome, Nature, 421,170-172.
    NCBI_Genome_Database [http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi].
    Palomeque, T., Antonio Carrillo, J., Munoz-Lopez, M. and Lorite, P. (2006) Detection of a mariner-like element and a miniature inverted-repeat transposable element (MITE) associated with the heterochromatin from ants of the genus Messor and their possible involvement for satellite DNA evolution, Gene, 371, 194-205.
    
    Roman Kolpakov and Gregory Kucherov, (2003) Finding approximate repetitions under Hamming distance, Theoretical computer science,, 303,135-136
    
    Santiago, N., Herraiz, C., Goni, J.R., Messeguer, X. and Casacuberta, J.M. (2002) Genome-wide analysis of the Emigrant family of MITEs of Arabidopsis thaliana, Mol Biol Evol, 19,2285-2293.
    
    Schenke, D., Sasabe, M., Toyoda, K., Inagaki, Y.S., Shiraishi, T. and Ichinose, Y. (2003) Genomic structure of the NtPDR1 gene, harboring the two miniature inverted-repeat transposable elements, NtToyal and NtStowaway101, Genes Genet Syst, 78,409-418.
    
    Siguier, P., Perochon, J., Lestrade, L., Mahillon, J. and Chandler, M. (2006) ISfinder: the reference centre for bacterial insertion sequences, Nucleic Acids Res, 34, D32-36.
    
    Stoeckenius, W. (1981) Walsby's square bacterium: fine structure of an orthogonal procaryote, J Bacteriol, 148,352-360.
    
    Tamura, K., Dudley, J., Nei, M. and Kumar, S. (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0, Mol Biol Evol, 24, 1596-1599.
    
    Tu, Z. (1997) Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti, Proc Natl Acad Sci U S A, 94, 7475-7480.
    
    Tu, Z. (2001) Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae, Proc Natl Acad Sci U S A, 98,1699-1704.
    Turcotte, K. and Bureau, T. (2002) Phylogenetic analysis reveals stowaway-like elements may represent a fourth family of the IS630-Tcl-mariner superfamily, Genome, 45, 82-90.
    Washietl, S., Hofacker, I.L. and Stadler, P.F. (2005) Fast and reliable prediction of noncoding RNAs, Proc Natl Acad Sci U S A, 102, 2454-2459.
    Yang, G. and Hall, T.C. (2003) MAK, a computational tool kit for automated MITE analysis, Nucleic Acids Res, 31, 3659-3665.
    Yang, G., Zhang, F., Hancock, C.N. and Wessler, S.R. (2007) Transposition of the rice miniature inverted repeat transposable element mPing in Arabidopsis thaliana, Proc Natl Acad Sci USA, 104, 10962-10967.
    Zhang, X., Feschotte, C, Zhang, Q., Jiang, N., Eggleston, W.B. and Wessler, S.R. (2001) P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases, Proc Natl Acad Sci U S A, 98,12572-12577.
    Zhou, F., Olman, V. and Xu, Y. (2008) Insertion Sequences show diverse recent activities in Cyanobacteria and Archaea, BMC Genomics, 9, 36.
    Zhou, F., Tran, T. and Xu, Y. (2008) Nezha, a novel active miniature inverted-repeat transposable element in cyanobacteria, Biochem Biophys Res Commun, 365,790-794.
    Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction, Nucleic Acids Res, 31,3406-3415.
    Bompfunewerer, A. F., C. Flamm, C. Fried, G. Fritzsch, I. L. Hofacker, J. Lehmann, K. Missal, A. Mosig, B. Muller, S. J. Prohaska, B. M. R. Stadler, P. F. Stadler, A. Tanzer, S. Washietl & C. Witwer, (2005) Evolutionary patterns of non-coding RNAs Theory in Biosciences 123 301-369
    
    Bureau, T. E. & S. R. Wessler, (1992) Tourist: a large family of small inverted repeat elements frequently associated with maize genes. Plant Cell 4: 1283-1294.
    
    Bureau, T. E. & S. R. Wessler, (1994a) Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc Natl Acad Sci U S A 91: 1411-1415.
    Bureau, T. E. & S. R. Wessler, (1994b) Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6: 907-916.
    Casa, A. M., C. Brouwer, A. Nagel, L. Wang, Q. Zhang, S. Kresovich & S. R. Wessler, (2000) Inaugural article: the MITE family heartbreaker (Hbr): molecular markers in maize. Proc Natl Acad Sci U S A 97: 10083-10089.
    Chandler, M. & J. Mahillon, (2002) Insertion sequences revisited. In: Mobile DNA II. N. L. Craig, R. Craigie, M. Gellert & A. M. Lambowitz (eds). Washington, DC, USA: American Society of Microbiology, pp.
    Chen, Y., F. Zhou, G. Li & Y. Xu, (2008) A recently active MITE, Chunjie, inserted into an operon without disturbing the operon structure in Geobacter uraniireducens Rf4. Accepted by Genetics.
    Delihas, N., (2008) Small mobile sequences in bacteria display diverse structure/function motifs. Mol Microbiol 67: 475-481.
    Feschotte, C., X. Zhang & S. R. Wessler, (2002a) Miniature Inverted-Repeat Transposable Elements and Their Relationship to Established DNA Transposons. In: Mobile DNA II. N. L. Craig, R. Craigie, M. Gellert & A. M. Lambowitz (eds). Washington, D.C.: ASM Press, pp.
    Feschotte, C., X. Zhang & S. R. Wessler, (2002b) Miniature Inverted-Repeat Transposable Elements and Their Relationship to Established DNA Transposons. In: Mobile DNA II. N. L. Craig, R. Craigie, M. Gellert & A. M. Lambowitz (eds). Washington, D.C.: ASM Press, pp. 1147-1158.
    Filee, J., P. Siguier & M. Chandler, (2007) Insertion sequence diversity in archaea. Microbiol Mol Biol Rev 71:121-157.
    Gruber, A. R., R. Neubock, I. L. Hofacker & S. Washietl, (2007) The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures. Nucleic Acids Res 35: W335-338.
    Jiang, N., Z. Bao, X. Zhang, H. Hirochika, S. R. Eddy, S. R. McCouch & S. R. Wessler, (2003) An active DNA transposon family in rice. Nature 421: 163-167.
    Jurka, J., V. V. Kapitonov, A. Pavlicek, P. Klonowski, O. Kohany & J. Walichiewicz, (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462-467.
    Kidwell, M. G., (2002) Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49-63.
    Lepetit, D., S. Pasquet, M. Olive, N. Theze & P. Thiebaud, (2000) Glider and Vision: two new families of miniature inverted-repeat transposable elements in Xenopus laevis genome. Genetica 108: 163-169.
    Mahillon, J. & M. Chandler, (1998) Insertion sequences. Microbiol Mol Biol Rev 62: 725-774.
    Mathews, D. H., J. Sabina, M. Zuker & D. H. Turner, (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288: 911-940.
    Mazzone, M., E. De Gregorio, A. Lavitola, C. Pagliarulo, P. Alifano & P. P. Di Nocera, (2001) Whole-genome organization and functional properties of miniature DNA insertion sequences conserved in pathogenic Neisseriae. Gene 278: 211-222.
    
    Mennecier, S., P. Servant, G. Coste, A. Bailone & S. Sommer, (2006) Mutagenesis via IS transposition in Deinococcus radiodurans. Mol Microbiol 59: 317-325.
    
    Ren, S. X., G. Fu, X. G. Jiang, R. Zeng, Y. G. Miao, H. Xu, Y. X. Zhang, H. Xiong, G. Lu, L. F. Lu, H. Q. Jiang, J. Jia, Y. F. Tu, J. X. Jiang, W. Y. Gu, Y. Q. Zhang, Z. Cai, H. H. Sheng, H. F. Yin, Y. Zhang, G. F. Zhu, M. Wan, H. L. Huang, Z. Qian, S. Y. Wang, W. Ma, Z. J. Yao, Y. Shen, B. Q. Qiang, Q. C. Xia, X. K. Guo, A. Danchin, I. Saint Girons, R. L. Somerville, Y. M. Wen, M. H. Shi, Z. Chen, J. G. Xu & G. P. Zhao, (2003) Unique physiological and pathogenic features of Leptospira interrogans revealed by whole-genome sequencing. Nature 422: 888-893.
    Siguier, P., J. Perochon, L. Lestrade, J. Mahillon & M. Chandler, (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34: D32-36.
    
    Smit, A. F. & A. D. Riggs, (1996) Tiggers and DNA transposon fossils in the human genome. Proc Natl Acad Sci USA 93: 1443-1448.
    Storz, G., (2002) An expanding universe of noncoding RNAs. Science 296: 1260-1263.
    Tamura, K., J. Dudley, M. Nei & S. Kumar, (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596-1599.
    Tu, Z., (1997) Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc Natl Acad Sci U S A 94: 7475-7480.
    Van den Broeck, D., T. Maes, M. Sauer, J. Zethof, P. De Keukeleire, M. D'Hauw, M. Van Montagu & T. Gerats, (1998) Transposon Display identifies individual transposable elements in high copy number lines. Plant J 13: 121-129.
    Washietl, S., I. L. Hofacker & P. F. Stadler, (2005) Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA 102: 2454-2459.
    Wheeler, D. L., T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. Dicuccio, R. Edgar, S. Federhen, M. Feolo, L. Y. Geer, W. Helmberg, Y. Kapustin, O. Khovayko, D. Landsman, D. J. Lipman, T. L. Madden, D. R. Maglott, V. Miller, J. Ostell, K. D. Pruitt, G. D. Schuler, M. Shumway, E. Sequeira, S. T. Sherry, K. Sirotkin, A. Souvorov, G. Starchenko, R. L. Tatusov, T. A. Tatusova, L. Wagner & E. Yaschenko, (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36: D13-21.
    Xu, M., R. Kong & X. Xu, (2006) A miniature insertion element transposable in Microcystis sp. FACHB 854. Progress in Natural Science 16: 486-491.
    
    Zhou, F., V. Olman & Y. Xu, (2008a) Insertion Sequences show diverse recent activities in Cyanobacteria and Archaea. BMC Genomics 9: 36.
    Zhou, F., T. Tran & Y. Xu, (2008b) Nezha, a novel active miniature inverted-repeat transposable element in cyanobacteria. Biochem Biophys Res Commun 365: 790-794.
    BUREAU,T.E.,and S.R.WESSLER,1992 Tourist:a large family of small inverted repeat dements frequently associated with maize genes.Plant Cell 4:1283-1294.
    BUREAU,T.E.,and S.R.WESSLER,1994a Mobile inverted-repeat elements of the Tourist family are associated with the genes of many cereal grasses.Proc Natl Acad Sci U S A 91:1411-1415.
    BUREAU,T.E.,and S.R.WESSLER,1994b Stowaway:a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants.Plant Cell 6:907-916.
    CHANDLER,M.,and J.MAHILLON,2002 Insertion sequences revisited in Mobile DNA Ⅱ,edited by N.L.CRAIG,R.CRAIGIE,M.GELLERT and A.M.LAMBOWITZ.American Society of Microbiology,Washington,DC,USA.
    DELCHER,A.L.,K.A.BRATKE,E.C.POWERS and S.L.SALZBERG,2007 Identifying bacterial genes and endosymbiont DNA with Glimmer.Bioinformatics 23:673-679.
    FESCHOTTE,C.,X.ZHANG and S.R.WESSLER,2002 Miniature Inverted-Repeat Transposable Elements and Their Relationship to Established DNA Transposons in Mobile DNA Ⅱ,edited by N.L.CRAIG,R.CRAIGIE,M.GELLERT and A.M.LAMBOWITZ.ASM Press,Washington,D.C.
    FILEE,J.,P.SIGUIER and M.CHANDLER,2007 Insertion sequence diversity in archaea.Microbiol Mol Biol Rev 71:121-157.
    GRUBER,A.R.,R.NEUBOCK,I.L.HOFACKER and S.WASHIETL,2007 The RNAz web server:prediction of thermodynamically stable and evolutionarily conserved RNA structures.Nucleic Acids Res 35:W335-338.
    HE,S.,C.LIU,G.SKOGERBO,H.ZHAO,J.WANG et al.,2008 NONCODE v2.0:decoding the non-coding.Nucleic Acids Res 36:D170-172.
    JIANG,N.,Z.BAO,X.ZHANG,H.HIROCHIKA,S.R.EDDY et al.,2003 An active DNA transposon family in rice.Nature 421:163-167.
    JURKA,J.,V.V.KAPITONOV,A.PAVLICEK,P.KLONOWSKI,O.KOHANY et al.,2005 Repbase Update,a database of eukaryotic repetitive elements.Cytogenet Genome Res 110:462-467.
    Kidwell, M. G., 2002 Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49-63.
    Lepetit, D., S. Pasquet, M. Olive, N. Theze and P. Thiebaud, 2000 Glider and Vision: two new families of miniature inverted-repeat transposable elements in Xenopus laevis genome. Genetica 108: 163-169.
    MATHEWS, D. H., J. SABINA, M. ZUKER and D. H. TURNER, 1999 Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288: 911-940.
    Mazzone, M., E. De Gregorio, A. Lavitola, C. Paguarulo, P. Alifano et al., 2001 Whole-genome organization and functional properties of miniature DNA insertion sequences conserved in pathogenic Neisseriae. Gene 278: 211-222.
    MCGINNIS, S., and T. L. MADDEN, 2004 BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32: W20-25.
    Mennecier, S., P. Servant, G. Coste, A. BAILONE and S. Sommer, 2006 Mutagenesis via IS transposition in Deinococcus radiodurans. Mol Microbiol 59: 317-325.
    SIGUIER, P., J. PEROCHON, L. Lestrade, J. Mahillon and M. CHANDLER, 2006 ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34: D32-36.
    SMIT, A. F., and A. D. RIGGS, 1996 Tiggers and DNA transposon fossils in the human genome. Proc Natl Acad Sci U S A 93: 1443-1448.
    TAMURA, K., J. DUDLEY, M. NEI and S. KUMAR, 2007 MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Mol Biol Evol.
    Tu, Z., 1997 Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc Natl Acad Sci U S A 94: 7475-7480.
    Tu,Z., 2001 Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. Proc Natl Acad Sci U S A 98: 1699-1704.
    WASHIETL, S., I. L. HOFACKER and P. F. STADLER, 2005 Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A 102: 2454-2459.
    WESTOVER, B. P., J. D. BUHLER, J. L. SONNENBURG and J. I. GORDON, 2005 Operon prediction without a training set. Bioinformatics 21: 880-888.
    Xu, M., R. KONG and X. Xu, 2006 A miniature insertion element transposable in Microcystis sp. FACHB 854. Progress in Natural Science 16: 486-491.
    Yang, G., F. Zhang, C. N. Hancock and S. R. Wessler, 2007 Transposition of the rice miniature inverted repeat transposable element mPing in Arabidopsis thaliana. Proc Natl Acad Sci U S A 104: 10962- 10967.
    Yu, J., S. HU, J. Wang, G. K. Wong, S. Li et al., 2002 A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79-92.
    Zhou, F., T. Tran and Y. Xu, 2008 Nezha, a novel active miniature inverted-repeat transposable element in cyanobacteria. Biochem Biophys Res Commun 365: 790-794.
    Enright,A.J.,Iliopoulos,I.,Kyrpides,N.C.and Ouzounis,C.A.(1999) Protein interaction maps for complete genomes based on gene fusion events,Nature,402,86-90.
    Fani,R.,Brilli,M.,Fondi,M.and Lio,P.(2007) The role of gene fusions in the evolution of metabolic pathways:the histidine biosynthesis case,BMC Evol Biol,7 Suppl 2,S4.
    Flannick,J.,Novak,A.,Srinivasan,B.S.,McAdams,H.H.and Batzoglou,S.(2006) Graemlin:general and robust alignment of multiple large interaction networks,Genome Res,16,1169-1181.
    Gamalielsson,J.and Olsson,B.(2008) Gene ontology-based semantic alignment of biological pathways by evolutionary search,J Bioinform Comput Biol,6,825-842.
    Green,M.L.and Karp,P.D.(2004) A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases,BMC Bioinformatics,5,76.
    Jiang,N.,Bao,Z.,Zhang,X.,Hirochika,H.,Eddy,S.R.,McCouch,S.R.and Wessler,S.R.(2003) An active DNA transposon family in rice,Nature,421,163-167.
    Kamburov,A.,Goldovsky,L.,Freilich,S.,Kapazoglou,A.,Kunin,V.,Enright,A.J.,Tsaftaris,A.and Ouzounis,C.A.(2007) Denoising inferred functional association networks obtained by gene fusion analysis,BMC Genomics,8,460.
    Marshall,A.and Hodgson,J.(1998) DNA chips:an array of possibilities,Nat Biotechnol,16,27-31.
    Menzel,G.,Dechyeva,D.,Keller,H.,Lange,C.,Himmelbauer,H.and Schmidt,T.(2006) Mobilization and evolutionary history of miniature inverted-repeat transposable elements(MITEs) in Beta vulgaris L,Chromosome Res,14,831-844.
    Ramsay,G.(1998) DNA chips:state-of-the art,Nat Biotechnol,16,40-44.
    Singh,R.,Xu,J.and Berger,B.(2008) Global alignment of multiple protein interaction networks with application to functional orthology detection,Proc Natl Acad Sci U S A,105,12763-12768.
    Teixeira, M.T., Dujon, B. and Fabre, E. (2002) Genome-wide nuclear morphology screen identifies novel genes involved in nuclear architecture and gene-silencing in Saccharomyces cerevisiae, J Mol Biol, 321, 551-561.
    Vandenbroucke, K., Robbens, S., Vandepoele, K., Inze, D., Van de Peer, Y. and Van Breusegem, F. (2008) Hydrogen peroxide-induced gene expression across kingdoms: a comparative analysis, Mol Biol Evol, 25, 507-516.
    Wernicke, S. and Rasche, F. (2007) Simple and fast alignment of metabolic pathways by exploiting local diversity, Bioinformatics, 23,1978-1985.
    Wodicka, L., Dong, H., Mittmann, M., Ho, M.H. and Lockhart, D.J. (1997) Genome-wide expression monitoring in Saccharomyces cerevisiae, Nat Biotechnol, 15, 1359-1367.
    Wu, H., Su, Z., Mao, F., Olman, V. and Xu, Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application, Nucleic Acids Res, 33,2822-2837.
    Zhang, S., Zhang, X.S. and Chen, L. (2008) Biomolecular network querying: a promising approach in systems biology, BMC Syst Biol, 2, 5.
    Zhang, X., Feschotte, C., Zhang, Q., Jiang, N., Eggleston, W.B. and Wessler, S.R. (2001) P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases, Proc NatlAcadSci U S A, 98,12572-12577.
    Zhenping, L., Zhang, S., Wang, Y., Zhang, X.S. and Chen, L. (2007) Alignment of molecular networks by integer quadratic programming, Bioinformatics, 23,1631-1639.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700