用户名: 密码: 验证码:
基因芯片设计与数据处理中的若干关键问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基因芯片技术的应用给现代生物学研究带来了巨大的变革。其出现和发展是多学科交叉的结果,其中一些关键问题的解决强烈依赖于生物信息学研究的支持,特别是基因芯片设计和数据分析问题。围绕基因芯片设计和数据分析问题,本文重点研究了:大规模细菌检测16S rRNA基因芯片探针设计和基因芯片表达谱数据分析。
     探索微生态系统中的生物多样性是元基因组学研究的重要内容,面对未知环境样本,快速准确地鉴定其中细菌的种系构成,研究细菌的广谱检测方法具有十分重要的应用价值,这项技术在复杂微生物菌落分析、微生物环境监控等领域具有广阔的应用前景。随着16S rRNA序列资源的不断丰富,以及寡核苷酸微阵列基因芯片技术的不断进步,设计大规模细菌检测16S rRNA基因芯片,分析未知环境样本中的微生物种群的种系构成成为可能。大规模细菌检测16S rRNA基因芯片设计问题的关键是探针的优化设计,首先,探针优化设计的基础是对寡核苷酸杂交行为的准确预测。采用计算的方法分析核酸分子在溶液中杂交反应的热力学性质具有很好的效果,然而,当杂交产生的双链体中有一个单链核酸分子一端固连于芯片表面时,这种复杂的固连效应会对探针捕获目标核酸分子过程的动力学和热力学分析产生较大的影响,降低了预测杂交稳定性的准确性。通过对大量杂交实验数据的统计分析,我们发现将完全匹配(perfect match,PM)探针和相应的错配(mismatch,MM)探针的杂交结合自由能相减,能在一定程度上消除这种固连效应对计算预测的负面影响。因此,针对每个探针设计阳性对照点能有效提高基因芯片对特异性和非特异性杂交的分辨能力。
     在设计大规模细菌检测16S rRNA基因芯片的过程中,所有的目标序列依据细菌分类学划分为多组。在一个分类单元中存在多种16S rRNA基因的拷贝,这就需要针对每个分类单元设计组特异性探针。已有的组特异性探针设计方法大都是基于全局序列对齐的,计算复杂度较高。本文设计了一种OligoSampling算法,使用MCMC方法进行组特异性探针设计。该方法不需要首先将序列全局对齐,而且具有更高的效率和灵活性。
     能够针对细菌分类单元设计组特异性探针,还不足以解决大规模细菌检测16S rRNA基因芯片设计问题。对于依据细菌分类学定义的一组16S rRNA序列,组内的目标序列相似但不完全相同,设计单个组特异性探针检测组内所有目标序列是很困难的。因此,本文进一步研究了设计多个探针通过组合方式进行细菌检测。每个探针能够特异性地检测组内一部分目标序列,通过组合(各个探针按照逻辑“或”的方式进行组合,任何一个探针检测出阳性信号即表示检测到目标菌种)就能够提高覆盖率。面对探针组合优化问题,本文提出了一个可行的基于相对熵和遗传算法的组合探针设计算法。将上述方法应用于大规模16S rRNA细菌检测基因芯片设计,结果表明设计的组特异性探针组具有很高的特异性和很低的交叉杂交发生率。
     针对某个菌种设计特异性探针组时,如果在进化过程中,其他不相干的菌株发生了多碱基突变,就可能与探针组中某个探针发生特异性杂交,这将对检测结果产生干扰。但是,组外的菌株在16S rRNA的多个位点上都通过多碱基突变而与多个组特异性探针均发生交叉杂交的可能性极小。基于以上事实,本文还研究了着眼于提高检测特异性的检测策略,即将多个探针按照逻辑“与”的方式组合(所有的探针都检测出阳性信号即表示检测到目标菌种)。我们首先使用OligoSampling算法设计组特异性候选探针,然后将多个候选探针按照逻辑“与”的方式组合成一个检测单元,再将多个检测单元按照逻辑“或”的方式组合起来检测目标菌种。结果表明,按照这种方式组合探针,也能够在保证特异性的基础上获得很高的覆盖度。
     一般的表达谱基因芯片数据处理方法包含两个步骤:标准化和差异表达基因识别。差异表达基因的存在,特别地,当上调的差异表达基因与下调的差异表达基因在数量上存在较大差别时,这会对标准化的准确性产生负面影响。而标准化的结果又会影响到差异表达基因的识别。对于一个两步过程——差异表达基因识别和标准化,无论哪一个先做,都会将误差累计到下一步。本文提出了一种新的基于迭代重选择算法的野点剔除方法,并将其应用于表达谱数据标准化。仿真和真实数据的分析结果表明,本文提出的方法能够在一个迭代过程中,逐步消除野点的影响,有效地提高表达谱数据标准化的准确性,并同时准确地识别差异表达候选基因。将该方法应用于啤酒酵母氨基酸缺乏培养表达谱数据,本文发现了与原始论文不一样的生物学结论,在参与碳水化合物代谢过程的基因中,同化酶的诱导先于异化酶的诱导,而不是两种酶同时诱导。
The application of DNA microarray technology has revolutionized the research in modern biology. The development of DNA microarray technology is the result of multidisciplinarity. Several critical issues in DNA microarray, such as design and data analysis, must be completed with the support of bioinformatics research. The dissertation focuses on design of 16S rRNA-based oligonucleotide array for large scale bacteria detection and analysis of microarray gene expression dataset.
     Revealing biodiversity in microbial communities is essential in metagenomics researches. The biotechniques developed for large scale phylogenetic identification of bacteria is of importance in analysis of unknown environmental samples. And it can be applied to analysis of microbial communities and environmental supervision of biological threat. With thousands of 16S rRNA gene sequences available, and advancements in oligonucleotide microarray technology, design of 16S rRNA-based oligonucleotide array in large scale bacteria detection for analysis of microorganisms in an unknown environmental sample consisting of hundreds of species may be possible. The critical issue of design of 16S rRNA-based oligonucleotide array in large scale bacteria detection is to find optimized probes. Firstly, optimization of probe design for array-based experiments requires improved power of predictability of oligonucleotide hybridization behavior. The thermodynamic properties of nucleic acid duplex formation and dissociation in solution have been well established. However, duplex formation using surface-immobilized DNA oligonucleotides is less well understood, presumably due to the complex factors affecting the kinetics and thermodynamics of target capture. Statistical analysis of large sets of hybridization data reveals that the negative effect of surface-immobilization can be reduced by subtraction of the hybridization free energy of PM (perfect match) and MM (mismatch) oligo-target duplexes. It is helpful for discrimination of specific and non-specific hybridization to design positive controls for each probe, and this can be implemented on the base of hybridization behavior prediction.
     All target sequences are clustered into several groups based on taxonomy in design of 16S rRNA-based oligonucleotide array for large scale bacteria detection. There exist multiple copies of 16S rRNA gene in a taxonomic unit. The concept of cluster- or group-specific probe should be introduced. Many of the existing strategies developed for group-specific oligonucleotide probe design are dependent on the result of global multiple sequences alignment, which is a time-consuming task. We present a novel program named OligoSampling that uses MCMC method to design group-specific oligonucleotide probes. Our method does not need to globally align target sequences. Furthermore, OligoSampling provides more flexibility and higher speed than other software programs based on global multiple sequences alignment.
     To design the 16S rRNA-based oligonucleotide array for large scale bacteria detection, it is not enough to design group-specific probes for bacterial taxonomic units. For groups of target sequences assembled based on taxonomy, target sequences of each group are homologous but not identical. Finding a unique group-specific probe that can specifically detect all target sequences in a group is often difficult. Hence, it is a cute trade-off to design non-unique probes. Each probe can specifically detect target sequences of a different subgroup. Combination of these multiple probes (identification based on disjunctive inference, any one of the probes exhibit positive signal for target group identification) can achieve higher coverage. However, it is a time-consuming task to evaluate all possible combinations. We presented a feasible algorithm using relative entropy and genetic algorithm (GA) to design group-specific non-unique probes. This scheme has been applied to the design of 16S rRNA-based oligonucleotide array in large scale bacteria detection. The results demonstrate that the designed 16S rRNA-based probe sets have high coverage and low cross-hybridization.
     We found that there was considerable risk that‘false’identities occur within 16S rRNA gene copies of unrelated microorganisms resulting from multiple 16S rRNA gene mutations during the course of evolution. At meantime, we believe that it is highly unlikely that‘false’identities evolved at multiple 16S rRNA sites in phylogenetically distant microorganisms. Based on this fact, we also proposed an identification scheme based on conjunctive inference (all probes exhibit positive signal for target taxon identification). We applied the OligoSampling developed in this dissertation to design group-specific probe candidates, and combined multiple probe candidates based on conjunctive inference to form an identification unit. And then multiple identification units were combined based on disjunctive inference to identify target group. The results demonstrate that combination of multiple probes in this way can improve coverage and specificity.
     There are two main steps in analysis of microarray gene expression data: normalization and identification of differentially expressed genes. Differentially expressed genes have negative impact on normalization, especially in the condition that the number of over-expressed genes and the number of under-expressed genes differ a lot. Furthermore, imprecise normalization can lead to failure in identification of differentially expressed genes. As a two-step statistical procedure, normalization or identification of differentially expressed genes can bring cumulating errors to each other. We proposed a new iterative reselection algorithm for outlier removal and applied this approach to normalization of microarray gene expression data. Simulated and real datasets were analyzed. Results demonstrate that our approach can eliminate the impact of outliers in an iterative reselection process, lead to significant improvement of the precision of normalization. As a result, candidates for differential expression can be efficiently identified simultaneously. Especially, based on normalization by using our method, we achieved some new biological explainations differing from Gasch’s original analysis on the same cDNA microarray datasets obtained in a study of transcriptional response when amino acid starvation was applied to Saccharomyces cerevisiae. For genes involving in carbohydrate metabolism, we found that the induction of synthetic enzymes is prior to the induction of catabolic enzymes, instead of simultaneous induction.
引文
[1] Watson J D, Crick F H C. A Structure for Deoxyribose Nucleic Acid[J]. Nature, 1953, 171: 737-738
    [2] Schena M.生物芯片分析(影印版)[M].北京:科学出版社, 2003.
    [3] Futschik M, Crompton T. Model selection and efficiency testing for normalization of cDNA microarray data[J]. Genome Biol, 2004, 5(8): R60
    [4] Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, Bloomfield C D, and Lander E S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439): 531-537
    [5] Lau S C, Liu W T. Recent advances in molecular techniques for the detection of phylogenetic markers and functional genes in microbial communities[J]. FEMS Microbiol Lett, 2007, 275(2): 183-190
    [6] Kuske C R, Barns S M, Grow C C, Merrill L, and Dunbar J. Environmental survey for four pathogenic bacteria and closely related species using phylogenetic and functional genes[J]. J Forensic Sci, 2006, 51(3): 548-558
    [7] Yang L L, Zhi X Y, Li W J. Phylogenetic analysis of Nocardiopsis species based on 16S rRNA, gyrB, sod and rpoB gene sequences[J]. Wei Sheng Wu Xue Bao, 2007, 47(6): 951-955
    [8] Tayeb L A, Lefevre M, Passet V, Diancourt L, Brisse S, and Grimont P A. Comparative phylogenies of Burkholderia, Ralstonia, Comamonas, Brevundimonas and related organisms derived from rpoB, gyrB and rrs gene sequences[J]. Res Microbiol, 2007
    [9] Woese C R, Fox G E. Phylogenetic structure of the prokaryotic domain: the primary kingdoms[J]. Proc Natl Acad Sci U S A, 1977, 74(11): 5088-5090
    [10] Olsen G J, Overbeek R, Larsen N, Marsh T L, McCaughey M J, Maciukenas M A, Kuan W M, Macke T J, Xing Y, and Woese C R. The Ribosomal Database Project[J]. Nucleic Acids Res, 1992, 20 Suppl: 2199-2200
    [11] Maidak B L, Cole J R, Lilburn T G, Parker C T, Jr., Saxman P R, Farris R J, Garrity G M, Olsen G J, Schmidt T M, and Tiedje J M. The RDP-II (Ribosomal Database Project)[J]. Nucleic Acids Res, 2001, 29(1): 173-174
    [12] Cole J R, Chai B, Farris R J, Wang Q, Kulam S A, McGarrell D M, Garrity G M, and Tiedje J M. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis[J]. Nucleic Acids Res, 2005, 33(Database issue): D294-296
    [13] Cole J R, Chai B, Farris R J, Wang Q, Kulam-Syed-Mohideen A S, McGarrell D M, Bandela A M, Cardenas E, Garrity G M, and Tiedje J M. The ribosomal databaseproject (RDP-II): introducing myRDP space and quality controlled public data[J]. Nucleic Acids Res, 2007, 35(Database issue): D169-172
    [14] Fodor S P, Rava R P, Huang X C, Pease A C, Holmes C P, and Adams C L. Multiplexed biochemical assays with biological chips[J]. Nature, 1993, 364(6437): 555-556
    [15] Pease A C, Solas D, Sullivan E J, Cronin M T, Holmes C P, and Fodor S P. Light-generated oligonucleotide arrays for rapid DNA sequence analysis[J]. Proc Natl Acad Sci U S A, 1994, 91(11): 5022-5026
    [16] You Y, Moreira B G, Behlke M A, and Owczarzy R. Design of LNA probes that improve mismatch discrimination[J]. Nucleic Acids Res, 2006, 34(8): e60
    [17] Warsen A E, Krug M J, LaFrentz S, Stanek D R, Loge F J, and Call D R. Simultaneous discrimination between 15 fish pathogens by using 16S ribosomal DNA PCR and DNA microarrays[J]. Appl Environ Microbiol, 2004, 70(7): 4216-4221
    [18] Jin D Z, Wen S Y, Chen S H, Lin F, and Wang S Q. Detection and identification of intestinal pathogens in clinical specimens using DNA microarrays[J]. Mol Cell Probes, 2006, 20(6): 337-347
    [19] Taroncher-Oldenburg G, Ward B B. Oligonucleotide microarrays for the study of coastal microbial communities[J]. Methods Mol Biol, 2007, 353: 301-315
    [20] Call D R. Challenges and opportunities for pathogen detection using DNA microarrays[J]. Crit Rev Microbiol, 2005, 31(2): 91-99
    [21] Ludwig W, Amann R, Martinez-Romero E, Sch?nhuber W, Bauer S, Neef A, and Schleifer K-H. rRNA based identification and detection systems for rhizobia and other bacteria[J]. Plant and Soil, 1998, 204: 1-19
    [22] Amann R, Ludwig W. Ribosomal RNA-targeted nucleic acid probes for studies in microbial ecology[J]. FEMS Microbiol Rev, 2000, 24(5): 555-565
    [23] Sun C P, Liao J C, Zhang Y H, Gau V, Mastali M, Babbitt J T, Grundfest W S, Churchill B M, McCabe E R, and Haake D A. Rapid, species-specific detection of uropathogen 16S rDNA and rRNA at ambient temperature by dot-blot hybridization and an electrochemical sensor array[J]. Mol Genet Metab, 2005, 84(1): 90-99
    [24] Hong B X, Jiang L F, Hu Y S, Fang D Y, and Guo H Y. Application of oligonucleotide array technology for the rapid detection of pathogenic bacteria of foodborne infections[J]. J Microbiol Methods, 2004, 58(3): 403-411
    [25] Vora G J, Meador C E, Stenger D A, and Andreadis J D. Nucleic acid amplification strategies for DNA microarray-based pathogen detection[J]. Appl Environ Microbiol, 2004, 70(5): 3047-3054
    [26] Wong C W, Heng C L, Wan Yee L, Soh S W, Kartasasmita C B, Simoes E A, Hibberd M L, Sung W K, and Miller L D. Optimization and clinical validation of a pathogen detection microarray[J]. Genome Biol, 2007, 8(5): R93
    [27] Martin F H, Castro M M, Aboul-ela F, and Tinoco I, Jr. Base pairinginvolving deoxyinosine: implications for probe design[J]. Nucleic Acids Res, 1985, 13(24): 8927-8938
    [28] Groebe D R, Uhlenbeck O C. Characterization of RNA hairpin loop stability[J]. Nucleic Acids Res, 1988, 16(24): 11725-11735
    [29] Mitsuhashi M, Cooper A, Ogura M, Shinagawa T, Yano K, and Hosokawa T. Oligonucleotide probe design--a new approach[J]. Nature, 1994, 367(6465): 759-761
    [30] Breslauer K J, Frank R, Blocker H, and Marky L A. Predicting DNA duplex stability from the base sequence[J]. Proc Natl Acad Sci U S A, 1986, 83(11): 3746-3750
    [31] Allawi H T, SantaLucia J, Jr. Nearest neighbor thermodynamic parameters for internal G.A mismatches in DNA[J]. Biochemistry, 1998, 37(8): 2170-2179
    [32] Allawi H T, SantaLucia J, Jr. Thermodynamics and NMR of internal G.T mismatches in DNA[J]. Biochemistry, 1997, 36(34): 10581-10594
    [33] Allawi H T, SantaLucia J, Jr. Nearest-neighbor thermodynamics of internal A.C mismatches in DNA: sequence dependence and pH effects[J]. Biochemistry, 1998, 37(26): 9435-9444
    [34] Allawi H T, SantaLucia J, Jr. Thermodynamics of internal C.T mismatches in DNA[J]. Nucleic Acids Res, 1998, 26(11): 2694-2701
    [35] SantaLucia J, Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics[J]. Proc Natl Acad Sci U S A, 1998, 95(4): 1460-1465
    [36] Urakawa H, El Fantroussi S, Smidt H, Smoot J C, Tribou E H, Kelly J J, Noble P A, and Stahl D A. Optimization of single-base-pair mismatch discrimination in oligonucleotide microarrays[J]. Appl Environ Microbiol, 2003, 69(5): 2848-2856
    [37] Pozhitkov A, Noble P A, Domazet-Loso T, Nolte A W, Sonnenberg R, Staehler P, Beier M, and Tautz D. Tests of rRNA hybridization to microarrays suggest that hybridization characteristics of oligonucleotide probes for species discrimination cannot be predicted[J]. Nucleic Acids Res, 2006, 34(9): e66
    [38] Matveeva O V, Shabalina S A, Nemtsov V A, Tsodikov A D, Gesteland R F, and Atkins J F. Thermodynamic calculations and statistical correlations for oligo-probes design[J]. Nucleic Acids Res, 2003, 31(14): 4211-4217
    [39] Mathews D H, Burkard M E, Freier S M, Wyatt J R, and Turner D H. Predicting oligonucleotide affinity to nucleic acid targets[J]. Rna, 1999, 5(11): 1458-1469
    [40] Hofacker I L, Fontana W, Stadler P F, Bonhoeffer L S, Tacker M, and Schuster P. Fast Folding and Comparison of RNA Secondary Structures[J]. Monatshefte f Chemie, 1994, 125: 167-188
    [41] Hofacker I L. Vienna RNA secondary structure server[J]. Nucleic Acids Res, 2003, 31(13): 3429-3431
    [42] Zuker M. Mfold web server for nucleic acid folding and hybridizationprediction[J]. Nucleic Acids Res, 2003, 31(13): 3406-3415
    [43] Mathews D H, Disney M D, Childs J L, Schroeder S J, Zuker M, and Turner D H. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure[J]. Proc Natl Acad Sci U S A, 2004, 101(19): 7287-7292
    [44] Nielsen H B, Wernersson R, Knudsen S. Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays[J]. Nucleic Acids Res, 2003, 31(13): 3491-3496
    [45] Emrich S J, Lowe M, Delcher A L. PROBEmer: A web-based software tool for selecting optimal DNA oligos[J]. Nucleic Acids Res, 2003, 31(13): 3746-3750
    [46] Rouillard J M, Zuker M, Gulari E. OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach[J]. Nucleic Acids Res, 2003, 31(12): 3057-3062
    [47] Wang X, Seed B. Selection of oligonucleotide probes for protein coding sequences[J]. Bioinformatics, 2003, 19(7): 796-802
    [48] Reymond N, Charles H, Duret L, Calevro F, Beslon G, and Fayard J M. ROSO: optimizing oligonucleotide probes for microarrays[J]. Bioinformatics, 2004, 20(2): 271-273
    [49] Gordon P M, Sensen C W. Osprey: a comprehensive tool employing novel methods for the design of oligonucleotides for DNA sequencing and microarrays[J]. Nucleic Acids Res, 2004, 32(17): e133
    [50] Chou H H, Hsia A P, Mooney D L, and Schnable P S. Picky: oligo microarray design for large genomes[J]. Bioinformatics, 2004, 20(17): 2893-2902
    [51] DeSantis T Z, Hugenholtz P, Keller K, Brodie E L, Larsen N, Piceno Y M, Phan R, and Andersen G L. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes[J]. Nucleic Acids Res, 2006, 34(Web Server issue): W394-399
    [52] Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Forster W, Brettske I, Gerber S, Ginhart A W, Gross O, Grumann S, Hermann S, Jost R, Konig A, Liss T, Lussmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, and Schleifer K H. ARB: a software environment for sequence data[J]. Nucleic Acids Res, 2004, 32(4): 1363-1371
    [53] Kumar Y, Westram R, Behrens S, Fuchs B, Glockner F O, Amann R, Meier H, and Ludwig W. Graphical representation of ribosomal RNA probe accessibility data using ARB software package[J]. BMC Bioinformatics, 2005, 6: 61
    [54] Kumar Y, Westram R, Kipfer P, Meier H, and Ludwig W. Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package[J]. BMC Bioinformatics,2006, 7: 240
    [55] Lawrence C E, Altschul S F, Boguski M S, Liu J S, Neuwald A F, and Wootton J C. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment[J]. Science, 1993, 262: 208-214
    [56] Lawrence C E, Reilly A A. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences[J]. Proteins, 1990, 7(1): 41-51
    [57] Tiquia S M, Wu L, Chong S C, Passovets S, Xu D, Xu Y, and Zhou J. Evaluation of 50-mer oligonucleotide arrays for detecting microbial populations in environmental samples[J]. Biotechniques, 2004, 36(4): 664-670, 672, 674-665
    [58] Gill S R, Pop M, Deboy R T, Eckburg P B, Turnbaugh P J, Samuel B S, Gordon J I, Relman D A, Fraser-Liggett C M, and Nelson K E. Metagenomic analysis of the human distal gut microbiome[J]. Science, 2006, 312(5778): 1355-1359
    [59] Backhed F, Ley R E, Sonnenburg J L, Peterson D A, and Gordon J I. Host-bacterial mutualism in the human intestine[J]. Science, 2005, 307(5717): 1915-1920
    [60] Turnbaugh P J, Ley R E, Hamady M, Fraser-Liggett C M, Knight R, and Gordon J I. The human microbiome project[J]. Nature, 2007, 449(7164): 804-810
    [61] Dupre J, O'Malley M A. Metagenomics and biological ontology[J]. Stud Hist Philos Biol Biomed Sci, 2007, 38(4): 834-846
    [62] Pennisi E. Metagenomics. Massive microbial sequence project proposed[J]. Science, 2007, 315(5820): 1781
    [63] Jurkowski A, Reid A H, Labov J B. Metagenomics: a call for bringing a new science into the classroom (while it's still new)[J]. CBE Life Sci Educ, 2007, 6(4): 260-265
    [64] Kurokawa K, Itoh T, Kuwahara T, Oshima K, Toh H, Toyoda A, Takami H, Morita H, Sharma V K, Srivastava T P, Taylor T D, Noguchi H, Mori H, Ogura Y, Ehrlich D S, Itoh K, Takagi T, Sakaki Y, Hayashi T, and Hattori M. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes[J]. DNA Res, 2007, 14(4): 169-181
    [65] Hooper L V, Gordon J I. Commensal host-bacterial relationships in the gut[J]. Science, 2001, 292(5519): 1115-1118
    [66] Turnbaugh P J, Ley R E, Mahowald M A, Magrini V, Mardis E R, and Gordon J I. An obesity-associated gut microbiome with increased capacity for energy harvest[J]. Nature, 2006, 444(7122): 1027-1031
    [67] Iweala O I, Nagler C R. Immune privilege in the gut: the establishment and maintenance of non-responsiveness to dietary antigens and commensal flora[J]. Immunol Rev, 2006, 213: 82-100
    [68] Eckburg P B, Bik E M, Bernstein C N, Purdom E, Dethlefsen L, Sargent M,Gill S R, Nelson K E, and Relman D A. Diversity of the human intestinal microbial flora[J]. Science, 2005, 308(5728): 1635-1638
    [69] Goodacre R. Metabolomics of a superorganism[J]. J Nutr, 2007, 137(1 Suppl): 259S-266S
    [70] Egert M, de Graaf A A, Smidt H, de Vos W M, and Venema K. Beyond diversity: functional microbiomics of the human colon[J]. Trends Microbiol, 2006, 14(2): 86-91
    [71] Belgrader P, Benett W, Hadley D, Richards J, Stratton P, Mariella R, Jr., and Milanovich F. PCR detection of bacteria in seven minutes[J]. Science, 1999, 284(5413): 449-450
    [72] Harmsen D, Singer C, Rothganger J, Tonjum T, de Hoog G S, Shah H, Albert J, and Frosch M. Diagnostics of neisseriaceae and moraxellaceae by ribosomal DNA sequencing: ribosomal differentiation of medical microorganisms[J]. J Clin Microbiol, 2001, 39(3): 936-942
    [73] Torsvik V, Ovreas L. Microbial diversity and function in soil: from genes to ecosystems[J]. Curr Opin Microbiol, 2002, 5(3): 240-245
    [74] Blackwood K S, Turenne C Y, Harmsen D, and Kabani A M. Reassessment of sequence-based targets for identification of bacillus species[J]. J Clin Microbiol, 2004, 42(4): 1626-1630
    [75] Lehner A, Loy A, Behr T, Gaenge H, Ludwig W, Wagner M, and Schleifer K H. Oligonucleotide microarray for identification of Enterococcus species[J]. FEMS Microbiol Lett, 2005, 246(1): 133-142
    [76] Bavykin S G, Mikhailovich V M, Zakharyev V M, Lysov Y P, Kelly J J, Alferov O S, Gavin I M, Kukhtin A V, Jackman J, Stahl D A, Chandler D, and Mirzabekov A D. Discrimination of Bacillus anthracis and closely related microorganisms by analysis of 16S and 23S rRNA with oligonucleotide microarray[J]. Chem Biol Interact, 2007
    [77] Majtan T, Bukovska G, Timko J. DNA microarrays--techniques and applications in microbial systems[J]. Folia Microbiol (Praha), 2004, 49(6): 635-664
    [78] Ye R W, Wang T, Bedzyk L, and Croker K M. Applications of DNA microarrays in microbial systems[J]. J Microbiol Methods, 2001, 47(3): 257-272
    [79] Zhou J. Microarrays for bacterial detection and microbial community analysis[J]. Curr Opin Microbiol, 2003, 6(3): 288-294
    [80] Bodrossy L, Sessitsch A. Oligonucleotide microarrays in microbial diagnostics[J]. Curr Opin Microbiol, 2004, 7(3): 245-254
    [81] Chung W H, Rhee S K, Wan X F, Bae J W, Quan Z X, and Park Y H. Design of long oligonucleotide probes for functional gene detection in a microbial community[J]. Bioinformatics, 2005, 21(22): 4092-4100
    [82] Gentry T J, Wickham G S, Schadt C W, He Z, and Zhou J. Microarrayapplications in microbial ecology research[J]. Microb Ecol, 2006, 52(2): 159-175
    [83] Lemarchand K, Masson L, Brousseau R. Molecular biology and DNA microarray technology for microbial quality monitoring of water[J]. Crit Rev Microbiol, 2004, 30(3): 145-172
    [84] Rudi K, Zimonja M, Trosvik P, and Naes T. Use of multivariate statistics for 16S rRNA gene analysis of microbial communities[J]. Int J Food Microbiol, 2007, 120(1-2): 95-99
    [85] Harmsen D, Rothganger J, Frosch M, and Albert J. RIDOM: Ribosomal Differentiation of Medical Micro-organisms Database[J]. Nucleic Acids Res, 2002, 30(1): 416-417
    [86] Garcia-Martinez J, Bescos I, Rodriguez-Sala J J, and Rodriguez-Valera F. RISSC: a novel database for ribosomal 16S-23S RNA genes spacer regions[J]. Nucleic Acids Res, 2001, 29(1): 178-180
    [87] Harmsen D, Dostal S, Roth A, Niemann S, Rothganger J, Sammeth M, Albert J, Frosch M, and Richter E. RIDOM: comprehensive and public sequence database for identification of Mycobacterium species[J]. BMC Infect Dis, 2003, 3: 26
    [88] Watanabe K, Nelson J, Harayama S, and Kasai H. ICB database: the gyrB database for identification and classification of bacteria[J]. Nucleic Acids Res, 2001, 29(1): 344-345
    [89] Cannone J J, Subramanian S, Schnare M N, Collett J R, D'Souza L M, Du Y, Feng B, Lin N, Madabusi L V, Muller K M, Pande N, Shang Z, Yu N, and Gutell R R. The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs[J]. BMC Bioinformatics, 2002, 3: 2
    [90] Wuyts J, Perriere G, Van De Peer Y. The European ribosomal RNA database[J]. Nucleic Acids Res, 2004, 32(Database issue): D101-103
    [91] Wilson K H, Wilson W J, Radosevich J L, DeSantis T Z, Viswanathan V S, Kuczmarski T A, and Andersen G L. High-density microarray of small-subunit ribosomal DNA probes[J]. Appl Environ Microbiol, 2002, 68(5): 2535-2541
    [92] Loy A, Bodrossy L. Highly parallel microbial diagnostics using oligonucleotide microarrays[J]. Clin Chim Acta, 2006, 363(1-2): 106-119
    [93] Huyghe A, Francois P, Charbonnier Y, Tangomo-Bento M, Bonetti E J, Paster B J, Bolivar I, Baratti-Mayer D, Pittet D, and Schrenzel J. Novel microarray design strategy to study complex bacterial communities[J]. Appl Environ Microbiol, 2008, 74(6): 1876-1885
    [94] Herwig R, Schmitt A O, Steinfath M, O'Brien J, Seidel H, Meier-Ewert S, Lehrach H, and Radelof U. Information theoretical probe selection for hybridisation experiments[J]. Bioinformatics, 2000, 16(10): 890-898
    [95] Borneman J, Chrobak M, Della Vedova G, Figueroa A, and Jiang T. Probeselection algorithms with applications in the analysis of microbial communities[J]. Bioinformatics, 2001, 17 Suppl 1: S39-48
    [96] Kaderali L, Schliep A. Selecting signature oligonucleotides to identify organisms using DNA arrays[J]. Bioinformatics, 2002, 18(10): 1340-1349
    [97] DeSantis T Z, Dubosarskiy I, Murray S R, and Andersen G L. Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA[J]. Bioinformatics, 2003, 19(12): 1461-1468
    [98] Mei R, Hubbell E, Bekiranov S, Mittmann M, Christians F C, Shen M M, Lu G, Fang J, Liu W M, Ryder T, Kaplan P, Kulp D, and Webster T A. Probe selection for high-density oligonucleotide arrays[J]. Proc Natl Acad Sci U S A, 2003, 100(20): 11237-11242
    [99] He Z, Wu L, Li X, Fields M W, and Zhou J. Empirical establishment of oligonucleotide probe design criteria[J]. Appl Environ Microbiol, 2005, 71(7): 3753-3760
    [100] Li X, He Z, Zhou J. Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation[J]. Nucleic Acids Res, 2005, 33(19): 6114-6123
    [101] Liebich J, Schadt C W, Chong S C, He Z, Rhee S K, and Zhou J. Improvement of oligonucleotide probe design criteria for functional gene microarrays in environmental applications[J]. Appl Environ Microbiol, 2006, 72(2): 1688-1691
    [102] Gupta R S. The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes[J]. FEMS Microbiol Rev, 2000, 24(4): 367-402
    [103] Stahl D A, Lane D J, Olsen G J, and Pace N R. Analysis of hydrothermal vent-associated symbionts by ribosomal RNA sequences[J]. Science, 1984, 224: 409-411
    [104] Stahl D A, Lane D J, Olsen G J, and Pace N R. Characterization of a Yellowstone hot spring microbial community by 5S rRNA sequences[J]. Appl Environ Microbiol, 1985, 49: 1379-1384
    [105] Lane D J, Stahl D A, Olsen G J, Heller D J, and Pace N R. Phylogenetic analysis of the genera Thiobacillus and Thiomicrospira by 5S rRNA sequences[J]. J Bacteriol, 1985, 163: 75-81
    [106] Woese C R. Bacterial evolution[J]. Microbiol Rev, 1987, 51: 221-271
    [107] Woese C R, Kandler O, Wheelis M L. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya[J]. Proc Natl Acad Sci U S A, 1990, 87(12): 4576-4579
    [108] Hugenholtz P, Goebel B M, Pace N R. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity[J]. J Bacteriol, 1998, 180(18): 4765-4774
    [109] Amann R I, Ludwig W, Schleifer K H. Phylogenetic identification and in situdetection of individual microbial cells without cultivation[J]. Microbiol Rev, 1995, 59(1): 143-169
    [110] Baggerly K A, Coombes K R, Hess K R, Stivers D N, Abruzzo L V, and Zhang W. Identifying differentially expressed genes in cDNA microarray experiments[J]. J Comput Biol, 2001, 8(6): 639-659
    [111] Callow M J, Dudoit S, Gong E L, Speed T P, and Rubin E M. Microarray expression profiling identifies genes with altered expression in HDL-deficient mice[J]. Genome Res, 2000, 10(12): 2022-2029
    [112] Conlon E M, Eichenberger P, Liu J S. Determining and analyzing differentially expressed genes from cDNA microarray experiments with complementary designs[J]. Journal of Mutivariate Analysis, 2004, 90(1): 1-18
    [113] Cui X, Hwang J T, Qiu J, Blades N J, and Churchill G A. Improved statistical tests for differential gene expression by shrinking variance components estimates[J]. Biostatistics, 2005, 6(1): 59-75
    [114] Dean N, Raftery A E. Normal uniform mixture differential gene expression detection for cDNA microarrays[J]. BMC Bioinformatics, 2005, 6: 173
    [115] Dudoit S, Yang Y H, Speed T P, and Callow M J. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments[J]. Statistica Sinica, 2002, 12(1): 111–139
    [116] Gottardo R, Pannucci J A, Kuske C R, and Brettin T. Statistical analysis of microarray data: a Bayesian approach[J]. Biostatistics, 2003, 4(4): 597-620
    [117] Kerr M K, Churchill G A. Experimental design for gene expression microarrays[J]. Biostatistics, 2001, 2(2): 183-201
    [118] Kerr M K, Martin M, Churchill G A. Analysis of variance for gene expression microarray data[J]. J Comput Biol, 2000, 7(6): 819-837
    [119] Loguinov A V, Mian I S, Vulpe C D. Exploratory differential gene expression analysis in microarray experiments with no or limited replication[J]. Genome Biol, 2004, 5(3): R18
    [120] Tsai C A, Chen Y J, Chen J J. Testing for differentially expressed genes with microarray data[J]. Nucleic Acids Res, 2003, 31(9): e52
    [121] Tseng G C, Oh M K, Rohlin L, Liao J C, and Wong W H. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects[J]. Nucleic Acids Res, 2001, 29(12): 2549-2557
    [122] Tsodikov A, Szabo A, Jones D. Adjustments and measures of differential expression for microarray data[J]. Bioinformatics, 2002, 18(2): 251-260
    [123] Draper N R, Smith H. Applied Regression Analysis[M]. New York: Wiley, 1998.
    [124] DeRisi J, Penland L, Brown P O, Bittner M L, Meltzer P S, M. Ray, Chen Y, Su Y A, and Trent J M. Use of a cDNA microarray to analyse gene expression patternsin human cancer[J]. Nature Genetics, 1996, 14: 457-460
    [125] Chen Y, Dougherty E R, Bittner M L. Ratio-based decisions and the quantitative analysis of cDNA microarray images[J]. Journal of Biomedical Optics, 1997, 2: 364-374
    [126] Gasch A P, Spellman P T, Kao C M, Carmel-Harel O, Eisen M B, Storz G, Botstein D, and Brown P O. Genomic Expression Programs in the Response of Yeast Cells to Environmental Change[J]. Molecular Biology of the Cell, 2000, 11(12): 4241-4257
    [127] Goryachev A B, Macgregor P F, Edwards A M. Unfolding of microarray data[J]. J Comput Biol, 2001, 8(4): 443-461
    [128] Sapir M, Churchill G A, Estimating the posterior probability of differential gene expression from microarray data. 2000, The Jackson Laboratory.
    [129] Newton M A, Kendziorski C M, Richmond C S, Blattner F R, and Tsui K W. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data[J]. J Comput Biol, 2001, 8(1): 37-52
    [130] Zhao Y, Li M C, Simon R. An adaptive method for cDNA microarray normalization[J]. BMC Bioinformatics, 2005, 6: 28
    [131] Metz C E, Herman B A, Shen J H. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data[J]. Stat Med, 1998, 17(9): 1033-1053
    [132] Hanley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve[J]. Radiology, 1982, 143(1): 29-36
    [133] Allison D B, Cui X, Page G P, and Sabripour M. Microarray data analysis: from disarray to consolidation and consensus[J]. Nat Rev Genet, 2006, 7(1): 55-65
    [134] Gruvberger-Saal S K, Cunliffe H E, Carr K M, and Hedenfalk I A. Microarrays in breast cancer research and clinical practice--the future lies ahead[J]. Endocr Relat Cancer, 2006, 13(4): 1017-1031
    [135] Li M, Wang B, Zhang M, Rantalainen M, Wang S, Zhou H, Zhang Y, Shen J, Pang X, Zhang M, Wei H, Chen Y, Lu H, Zuo J, Su M, Qiu Y, Jia W, Xiao C, Smith L M, Yang S, Holmes E, Tang H, Zhao G, Nicholson J K, Li L, and Zhao L. Symbiotic gut microbes modulate human metabolic phenotypes[J]. Proc Natl Acad Sci U S A, 2008, 105(6): 2117-2122
    [136] Shi L, Reid L H, Jones W D, Shippy R, Warrington J A, Baker S C, Collins P J, de Longueville F, Kawasaki E S, Lee K Y, Luo Y, Sun Y A, Willey J C, Setterquist R A, Fischer G M, Tong W, Dragan Y P, Dix D J, Frueh F W, Goodsaid F M, Herman D, Jensen R V, Johnson C D, Lobenhofer E K, Puri R K, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber P K, Zhang L, Amur S, Bao W, Barbacioru C C, Lucas A B, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao X M, Cebula T A, Chen J J, Cheng J, Chu T M, Chudin E, Corson J, Corton J C, Croner L J,Davies C, Davison T S, Delenstarr G, Deng X, Dorris D, Eklund A C, Fan X H, Fang H, Fulmer-Smentek S, Fuscoe J C, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje P K, Han J, Han T, Harbottle H C, Harris S C, Hatchwell E, Hauser C A, Hester S, Hong H, Hurban P, Jackson S A, Ji H, Knight C R, Kuo W P, LeClerc J E, Levy S, Li Q Z, Liu C, Liu Y, Lombardi M J, Ma Y, Magnuson S R, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr M S, Osborn T W, Papallo A, Patterson T A, Perkins R G, Peters E H, Peterson R, Philips K L, Pine P S, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig B A, Samaha R R, Schena M, Schroth G P, Shchegrova S, Smith D D, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson K L, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker S J, Wang S J, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W, Jr. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression easurements[J]. Nat Biotechnol, 2006, 24(9): 1151-1161
    [137] Asare A L, Gao Z, Carey V J, Wang R, and Seyfert-Margolis V. Power enhancement via multivariate outlier testing with gene expression arrays[J]. Bioinformatics, 2009, 25(1): 48-53
    [138] Cleveland W S. Robust locally weighted regression and smoothing scatterplots[J]. J. Amer. Statist. Assoc, 1979, 74: 829-836
    [139] Cleveland W S. LOWESS: A program for smoothing scatterplots by robust locally weighted regression[J]. The American Statistician, 1981, 35: 54
    [140] Jornsten R, Wang H Y, Welsh W J, and Ouyang M. DNA microarray data imputation and significance analysis of differential expression[J]. Bioinformatics, 2005, 21(22): 4155-4161
    [141] Park T, Yi S G, Lee S, Lee S Y, Yoo D H, Ahn J I, and Lee Y S. Statistical tests for identifying differentially expressed genes in time-course microarray experiments[J]. Bioinformatics, 2003, 19(6): 694-703
    [142] McLachlan G J, Bean R W, Jones L B. A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays[J]. Bioinformatics, 2006, 22(13): 1608-1615
    [143] Wu B. Differential gene expression detection and sample classification using penalized linear regression models[J]. Bioinformatics, 2006, 22(4): 472-476

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700