用户名: 密码: 验证码:
面向生物医学文本的疾病关系挖掘模型及算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
生物医学领域的文献记录展现了该领域内的大量成果和实验发现。生物医学文本挖掘技术作为研究热点之一,可以快速有效地在海量的文献中获取相关知识。生物医学文本挖掘技术包括信息检索、文本分类、命名实体识别、关系抽取、假设生成等。随着基因技术的飞速发展,人们迫切希望从分子水平认识各种疾病的发生机制。在生物医学领域的文献中针对疾病进行关系的挖掘并构建疾病网络,挖掘与疾病相关的隐藏信息,给生物医学领域的科学家提供假设生成的依据,对于人类的发展、疾病的预防以及新药的研制都有着重要的意义。首先在生物医学命名实体识别获得良好性能的基础上给出了疾病和其他实体的本体标注方法,然后对文本进行分类以后再标注,进而进行关系的抽取和假设生成,从而对疾病和其他实体的关系进行预测。
     现有的生物医学命名实体识别方法将实体边界探测和语义标识任务在一个模型中完成,另外生物医学命名实体往往很长,相对单词级的特征而言,构建实体级的特征对于命名实体识别任务更加自然。因此,提出一种基于双层半马尔科夫条件随机场的实体识别方法,将任务划分成两个阶段来进行标记将是一个可行的解决方法。在第一阶段,命名实体和非实体被检测出来,分别标记为C和O。在第二阶段,命名实体被标记为具体的实体类别如蛋白质、DNA、RNA、Cell_line、Cell_type等。针对每一个阶段,挖掘了新的有用的特征。鉴于有些特征只作用于某一阶段,双层模型极大的减少了特征的维度。通过实验验证了算法的有效性,较之现有算法,基于双层半马尔科夫条件随机场的实体识别方法在JNLPBA2004语料集上达到了74.64%的F值。
     针对生物医学文献中关于疾病的命名实体识别存在类型不明确、精度低的问题,提出了基于疾病本体的标注方法,使用标准词表对疾病概念进行标注和标准化。采用双层半马尔科夫条件随机场模型对疾病实体进行识别,包括在文本中的位置信息和标识。随后,通过计算疾病实体和疾病本体中概念的相似度对已识别的疾病进行标注。最后,疾病实体根据相似度分别被识别为疾病概念和疾病实例。该实验基于Arizona疾病语料集并取得了很好的实验结果。
     研究了基于文本发现的疾病语义关系挖掘。首先对文本进行疾病本体和基因本体的标注,建立基于文本的描述疾病和基因功能关系的语义网络。其次,从网络中抽取相似的子图并由子图的相似度来推导疾病之间的关系。从MEDLINE中随机选取了初始语料集,该实验获得了较好的性能并能够发现疾病之间的潜在关系。
     研究了关于疾病的假设生成问题。通过探索疾病与基因功能、药物实体之间的语义网络,抽取文本中与疾病有关的子语义网络,提取疾病与其他实体之间的语义关系。使用主题模型对相关实体进行语义扩展,并按照四类主题对文章进行分类,包括疾病与疾病,疾病与基因功能,药物与基因功能,以及疾病与药物。并在以上分类结果的基础上,根据句子级的概念共现和实体间的语义关联,以找出实体间隐含的关系。
     通过上述方法构建的疾病网络具有较强的实用性,能够对疾病之间、疾病和基因、药物和基因、疾病和药物之间的假设生成进行预测,为科研人员进行临床验证提供依据。
The rapidly increasing amount of literature in biomedical domain promotes theapplication of text mining. As one of the hot topics, biomedical text mining could getuseful knowledge from a large number of literatures rapidly and efficiently. Biomedicaltext mining techniques contain information retrieval, text classification, named entityrecognition, relationship extraction and hypothesis generation. With the rapiddevelopment of gene techniques, recognizing pathogenetic mechanism from molecularlevel becomes very important. Relationship mining of disease and building diseasecentric network from biomedicine literature could provide evidence of hypothesisgeneration for scientists. Mining hidden information of disease makes good sense for thedisease prevention and development of new drugs. After a good performance onbiomedical named entity recognition, the ontology annotation would be carried out on aresult of a classification for literature. Subsequently, relationships between diseases andother entities would be predicted.
     The most methods in biomedical named entity recognition are single-phase.That is,making term boundary detection and semantic labeling into one task. Semi-Markovconditional random fields model (semi-CRFs) put the label to a segment not a singleword which is more natural than the other machine learning methods. We represent atwo-phase approach based on semi-Markov conditional random fields model (semi-CRFs)and explores novel feature sets for identifying the entities in text into5types: protein,DNA, RNA, cell_line and cell_type. Our approach divides the biomedical named entityrecognition (NER) task into two sub-tasks: term boundary detection and semanticlabeling. At the first phase, term boundary detection sub-task detects the boundary of theentities and classifies the entities into one type C. At the second phase, semantic labelingsub-task labels the entities detected at the first phase the correct entity type. We explorenovel feature sets at both phases to improve the performance. Our experiments based onsemi-CRFs without deep domain knowledge and post-processing algorithms gets anF-score of74.64%on the JNLPBA2004corpus, which outperforms most of thestate-of-the-art systems.
     Up to now, the biomedical text mining for diseases is limited to the recognition ofdisease names. Few work focus on the type of diseases and relations between diseases.Only the recognition of the biomedical concepts in literature is not enough, annotationsand normalizations of the concepts with normalized Metathesaurus get even moreimportant. We propose a system to annotate the literature with normalized Metathesaurus. First, a two-phase semi-Markov Conditional Random Fields (semi-CRFs) is used torecognize the disease mentions, including the location and identification. Then, we adaptthe Disease Ontology (DO) to annotate the diseases recognized for normalization bycomputing the similarity between disease mentions and concepts. According to thesimilarities, the disease mentions are denoted as disease concepts and instancesdistinctively. The experiments carried out on the Arizona Disease Corpus show that oursystem makes a good achievement and outperforms the other works.
     There is a lot of knowledge hidden in biomedicine literatures. With the everincreasing amount of biomedicine literatures, mining the relations automatically is veryurgent. The relations between diseases and gene functions are waiting to be mining. Wepropose a method to mine relations between diseases with common gene functions in theliterature with normalized Metathesaurus. First, a two-phase semi-CRFs model is used torecognize the disease mentions and gene function mentions, including the location andidentification. Then, we adapt the Disease Ontology (DO) and the Gene Ontology (GO)to annotate the diseases and gene functions recognized for normalization by computingthe similarity between mentions and concepts. According to the similarities, the mentionsare denoted as concepts and instances distinctively. Thirdly, we build a network andmeasure relations between diseases by computing similarities between commonsub-graphs. The experiments carried out on a corpus randomly selected by GoPubMedwith disease and the three domains in GO. The performance shows a lot of hiddenrelations between diseases and gives an explanation.
     Finally, hypothesis generation of diseases should work. We build semantic networksamong diseases, gene functions and drug entities, extract sub-semantic networks aboutdiseases and get semantic relationships among diseases and other entities through text.We make semantic extension to entities using topic model. The documents are classifiedinto four topics: diseases, diseases and gene functions, drugs and gene functions, diseasesand drugs. We mine hidden relationships among diseases according to co-occurrence insentences and semantic association of entities.
     Hence, the disease network building by the above methods has a good application. Itcould predict hypothesis among diseases, drugs, gene functions, then provides evidencefor test with researchers.
引文
[1] Aaron M. Cohen, William R. Hersh. A survey of current work in biomedical textmining. Briefings in Bioinformatics,2005,6(1):57-71
    [2] Ananiadou S, Kell DB, Tsujii J. Text mining and its potential applications insystems biology. Trends in Biotechnology,2006,24(12):571-579
    [3] Zhou G D, Su J. Exploring deep knowledge resources in biomedical namerecognition. In: Proceedings of the International Joint Workshop on NaturalLanguage Processing in Biomedicine and its Applications. Stroudsburg, PA, USA:Association for Computational Linguistics,2004.96-99
    [4] Lee K.J., Hwang Y.S., Rim H.C. Two-phase biomedical NE recognition based onSVMs. In: Proceedings of the ACL2003workshop on Natural language processingin biomedicine. Stroudsburg, PA, USA: Association for Computational Linguistics,2003.33-40
    [5] Finkel J., Dingare S., Nguyen H., Nissim M., Manning C. Exploiting context forbiomedical entity recognition: from syntax to the web. In: Proceedings of theInternational Joint Workshop on Natural Language Processing in Biomedicine andits Applications. Stroudsburg, PA, USA: Association for Computational Linguistics,2004.88-91
    [6] Settles B. Biomedical named entity recognition using conditional random fields andrich feature sets. In: proceedings of the Joint workshop on Natural LanguageProcessing in Biomedicine and its Applications. Stroudsburg, PA, USA:Association for Computational Linguistics,2004.104-107
    [7] Tanabe L, Scherf U, Smith L, et al. MedMiner: An internet text-ming tool forbiomedical information, with application to gene expression profiling.Biotechniques.1999,27:1210-1217
    [8] Muller H M, Kenny E E, Sternberg P W. Textpresso: an ontology-based informationretrieval and extraction system for biological literature. Plos Biology,2004,2(11):e309
    [9] Perez-Iratxeta C, Bork P, Andrade A M. XplorMed: a tool for exploring MEDLINEabstracts. Trends in Biochemical Sciences,2001,26:573-575
    [10] Hoffmann R, Valencia A. A gene network for navigating the literature. NatureGenetics,2004,36:664-664
    [11] Doms A, Schroeder M. GoPubMed: exploring PubMed with the Gene Ontology.Nucleic Acids Research,2005,33(2):W783-W786
    [12] Hoffmann R, et al. Text mining for metabolic pathways, signaling cascades, andprotein networks. Science’s STKE,2005,283:pe21
    [13] Yeh A S, Hirschman L, Morganm A A. Evaluation of text data mining for databasecuration: lessons learned from the KDD Challenge Cup. Bioinformatics,2003,19(S1):i331-339
    [14] Regev Y, Finkelstein-Landau M, Feldman R. Rule-based extraction of experimentalevidenee in the biomedical domain-the KDD Cup2002. ACM SIGKDDExplorations NewsIetter,2002,4(2):90-92
    [15] Ghanem M M, Guo Y, Lodhi H, et al. Automatic scientific text classification usinglocal Patterns-KDD Cup2002. ACM SIGKDD Exploratlons Newsletter,2003,4(2):95-96
    [16] Donaldson I, Martin J, de Bruijn B, et al. PreBIND and Textomy-mining thebiomedical literature for protein-protein interactions using a support vectormachine. BMC Bioinformatics,2003,4:11-24
    [17] Eppig J T, Bult C J, Kadin J A et al. The mouse genome database(MGD): fromgenes to mice-a community resource for mouse biology. Nucleic Acids Research,2005,33(S1):D471-D475
    [18] Hersh W R, Bhupatiraju R T. TREC genomics track overview. In: Proceedings ofthe l4th Text Retrieval Conference: TREC2005. MD: National Institute forStandards&Technology. Gaithersburg,2005:14-23
    [19] Dayanik A, Fradkin D, Genkin A, et al. DIMACS at the TREC2005genomics track.In: Proceedings of the Fourteenth Text Retrieval Conferenee. MD: NationalInstitute for Standards&Teehnology. Gaithersburg, MD, November2005
    [20] Si L, Kanungo T. Thresholding strategies for text classifiers: TREC2005biomedical triage task experiments. In: Proeeedings of the Fourteenth TextRetrieval Conference. MD: National Institute for Standards&Technology.Gaithersburg, MD, November2005
    [21] McDonald R., Pereira F. Identifying gene and protein mentions in text usingconditional random fields. BMC Bioinformatics,2005,6(S1): S6
    [22] Tsai R.T., Sung C.L., Dai H.J., et al. NERBio: using selected word conjunctions,term normalization, and global patterns to improve biomedical named entityrecognition. BMC Bioinformatics,2006,7(S5):S11
    [23] Sunita Sarawagi and William W. Cohen. Semi-markov conditional random fieldsfor information extraction. In: Proceeding of the Eighteenth Annual Conference onNeural Information Processing Systems,2004
    [24] Kim J.D., Ohta T., Tsuruoka Y., et al. Introduction to the bio-entity recognition taskat JNLPBA. In: Proceedings of the International Joint Workshop on NaturalLanguage Processing in Biomedicine and its Applications. Stroudsburg, PA, USA:Association for Computational Linguistics,2004.70-75
    [25] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilisticmodels for segmenting and labeling sequence data. In: Proceedings of theInternational Conference on Machine Learning. Williams: MA,2001.282-289
    [26] Daisuke Okanohara, Yusuke Miyao, Yoshimasa Tsuruoka, et al. Improving theScalability of Semi-Markov Conditional Random Fields for Named EntityRecognition. In: Proceedings of the21st International Conference onComputational Linguistics and44th Annual Meeting of the ACL. Stroudsburg, PA,USA: Association for Computational Linguistics,2006.465-472
    [27] Chan S.K., Lam W., Yu X. A cascaded approach to biomedical named entityrecognition using a unified model. In: Proceedings of Seventh IEEE InternationalConference on Data Mining. Omaha, NE,2007.93-102
    [28] Kulick S., Bies A., Liberman M., et al. Integrated annotation for biomedicalinformation extraction. In: Proceedings of the Human Language TechnologyConference and the Annual Meeting of the North American Chapter of theAssociation for Computational Linguistics (HLT/NAACL).2004.61-68
    [29] Seonho Kim, Juntae Yoon, Kyung-Mi Park, et al. Two-Phase Biomedical NamedEntity Recognition Using A Hybrid Method. In: Proceedings of the Secondinternational joint conference on Natural Language Processing. Jeju Island, Korea,October11-13,2005.646-657
    [30] Kim S., Yoon J. Experimental study on a two phase method for biomedical namedentity recognition. IEICE-Transactions on Information and Systems,2007,E90-D(7):1103-1110
    [31] Kim J D, Tomoko, Yoshimasa T, et al. Introduction to the bio-entity recognition taskat JNLPBA. In: Proceedings of the Joint WorkshoP on Natural LanguageProeessing in Biomedieine and APPlications.Geneva, Switzerland.2004.70-75
    [32] Hirschman L, Yeh A, Blasehke C, et al. Overview of BioCreAtlvE: criticalassessment of information extraction for biology. BMC Bioinformatics,2005,6(S1):51
    [33] Kim J.D., Ohta T., Tateisi Y., et al. GENIA corpus-a semantically annotated corpusfor bio-text mining. Bioinformatics,2003,19(S1):180-182
    [34] Tsuruoka Y,Tsujii J. Boosting Precision and recall of dictionary based protein namerecognition. In: Proceedings of the ACL-03workshop on Natural LanguageProcessing in Biomedicine. Sapporo, Japan,2003.41-48
    [35] Cohen A M. Unsupervised gene/protein entity normalization using automaticallyextracted dictionaries. In: Proeeedings of the ACL-ISMB Workshop on LinkingBiological Literature, ontologies and Databses: Mining Biological Semantics.Detroit, Ml.2005.14-24
    [36] Fukuda K, Tsunoda T, Tamura A, et al. Toward information extraction: identifyingProtein names from Biological Papers. In:Proceedings of the Pacific Symposiumon Biocomputing. Hawai, USA,1998.705-716
    [37] Olsson F, Eriksson G, Franzen K, et al. Notions of correctness when evaluatingprotein name taggers. In: Proceedings of the19th International Conference onComputational Linguisties. TaiPei, Taiwan,2002.765-771
    [38] Pustejovsky J, Castano J, Zhang J. Robust relational parsing over biomedicalliterature: extracting inhibit relations, In: Proceedings of the Pacific Symposium onBioComputing. Hawaii, USA,2002.362-373
    [39] Leroy q, Chen H, Martinez J D. A shallow parser based on closed-class words tocapture relations in biomedical text. Journal of Biomedical Informatics,2003,36(3):145-158
    [40] Park J C, Kim H S, Kim J J. Bidirectional incremental parsing for automaticpathway identification with combinatory categorical grammar. In: Proceedings ofthe Pacific Symposium on BioComputing. Hawaii, USA,2001.396-407
    [41] Temkin J M, Gilder M R. Extraction of protein interaction information fromunstructured text using a context-free grammar. Bioinformatics,2003,19(16):2046-2053
    [42] Ahmed S T, Chidambaram D, Davulcu H, et al. IntEx: a syntactic role drivenprotein-protein interaction extractor for bio-medical text. In: Proceedings of theACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases:Mining Biological Semantics. Stroudsburg, PA, USA: Association forComputational Linguistics,2005.54-61
    [43] Ono T, Hishigaki H, Tanigami A, et al. Automatic extraction of information onprotein-protein interactions from the biological literature. Bioinformatics,2001,17(2):155-161
    [44] Huang M L, Zhu X Y, Hao Y, et al. Discovering patterns to extract protein-proteininteractions from full texts. Bioinformatics,2004,20(18):3604-3612
    [45] David C, Bernard B, William L, et al. BioltAT: extracting biological informationfrom full-length papers. Bioinformatics,2004,20(17):3206-3213
    [46]封二英,牛耘,魏欧.基于大规模文本的蛋白质交互关系自动提取.计算机应用,2012, A01:147-150
    [47] Craven M, Kumlien J. Constructing biological knowledge bases by extractinginformation from text sources. In: Proceedings of the7th International Conferenceon Intelligent Systems for Molecular Biology. Heidelberg, Germany,1999.77-86
    [48] Stapley B, Benoit G. Biobibliometrics: information retrieval and visualization fromco-occurrences of gene names in Medline abstracts. In: Proceedings of the PacificSymposium on Biocomputing. Hawaii, USA,2000.529-540
    [49] Marcotte E M, Xenarios I, Eisenberg D. Mining literature for protein-proteininteractions. Bioinformatics,2001,17(4):359-363
    [50] Donaldson I, Martin J, de Bruijn B, et al. PreBIND and Textomy-mining thebiomedical literature for protein-protein interactions using a support vectormachine. BMC Bioinformatics,2003,4:11
    [51] Blaschke C, Valencia A. Can bibliographic pointers for known biological data befound automatically? Protein interactions as a case study. Comparative andFunctional Genomics,2001,2(4):196-206
    [52] Lukasz S, Christopher S M, Adam J S, et al. The database of interacting proteins:2004update. Nucleic Acids Research,2004,32(1):449-451
    [53] Swanson D. Two medical literatures that are logically but not bibliographicallyconnected. Journal of the American Society for Information Retrieval,1987,38(4):228-233
    [54] Yang Z H, Lin H F, Li Y P et al. TREC2005genomics track experiments at DUTAI.In: Proceedings of the14th Text REtrieval Conference. Gaithersburg, Maryland,2005.1-9
    [55] Yang Z H, Lin H F, Li Y P et al. DUTIR at TREC2006Genomics and EnterpriseTracks. In: Proceedings of the15th Text REtrieval Conference. Gaithersburg,Maryland,2006.1-10
    [56] Yang Z H, Lin H F, Cui B J et al. DUTIR at TREC2007Genomics Track. In:Proceedings of the16th Text REtrieval Conference. Gaithersburg, Maryland,2007.1-4
    [57]郑海,林鸿飞,杨志豪.基于概念和关联扩充的文本标题分类机制.小型微型计算机系统,2005,26(5):732-734
    [58]袁芳,周艳红,王佳.通过文本挖掘获取疾病相关功能信息.微计算机信息,2009,36(4):11-13
    [59] Calin G.A., Sevignani C., Dumitru C.D., et al. Human microRNA genes arefrequently located at fragile sites and genomic regions involved in cancers.Proceedings of National Academy of Science of the United States of America,2004,101(9):2999-3004
    [60] McManus M.T. MicroRNAs and cancer. Seminars in Cancer Biology.2003,13(4):253-258
    [61] Lu J., Getz G., Miska E.A. MicroRNA expression profiles classify humancancers.Nature,2005,435:834-838
    [62] Cimmino,A., Calin,G.A., Fabbri,M., et al. miR-15and miR-16induce apoptosis bytargeting BCL2.Proceedings of National Academy of Science of the United Statesof America,2005,102(39):13944-13949
    [63] Yang B., Lin H., Xiao J. The muscle-specific microRNA miR-1regulates cardiacarrhythmogenic potential by targeting GJA1and KCNJ2.Nature Medicine,2007,13:486-491
    [64] Kloosterman W.P., Plasterk R.H. The diverse functions of microRNAs in animaldevelopment and disease. Developmental Cell,2006,11(4):441-450
    [65] Cho W.C. OncomiRs: the discovery and progress of microRNAs in cancers.Molecular Cancers,2007,6:60
    [66] Esquela-Kerscher A, Slack F J. Oncomirs-microRNAs with a role in cancer. NatureReviews Cancer,2006,6(4):259-269
    [67] Calin G.A., Croce C.M. MicroRNA signatures in human cancers. Nature ReviewsCancer,2006,6(11):857-866
    [68] Zhang C. MicroRNomics: a newly emerging approach for disease biology.Physiological Genomics,2008,33(2):139-147
    [69] Nelson P. T., Wang W., Rajeev B.W. MicroRNAs (miRNAs) in neurodegenerativediseases. Brain Pathology,2008,18(1):130-138
    [70] Jiang Q, Wang Y, Hao Y, et al. miR2Disease: a manually curated database formicroRNA deregulation in human disease. Nucleic Acids Research,2009,37(Database issue):98-104
    [71] Hudder A., Novak R.F. miRNAs: effectors of environmental influences on geneexpression and disease. Toxicol Sciences,2008,103(2):228-240
    [72] Wheeler D.L., Barrett T., Benson D.A., et al. Database resources of the NationalCenter for Biotechnology Information. Nucleic Acids Research,2007,38(Databaseissue):5-12
    [73] Bodenreider O. The Unified Medical Language System (UMLS): integratingbiomedical terminology. Nucleic Acids Research,2004,32(Database issue):267-270
    [74] Q. Mei, D. Cai, D. Zhang, et al. Topic modeling with network regularization. In:Proceedings of the17th International World Wide Web Conference. New York,USA: ACM,2008.101-110
    [75] J. Tang, R. Jin, J. Zhang. A topic modeling approach and its integration into therandom walk framework for academic search. In: Proceedings of IEEEInternational Conference on Data Mining.2008.1055-1060
    [76] David Andrzejewski, Xiaojin Zhu, Mark Craven. Incorporating domain knowledgeinto topic modeling via Dirichlet Forest priors. In: Proceedings of the26thInternational Conference on Machine Learning. New York, USA: ACM,2009.25-32
    [77] David M. Blei, John D. Lafferty. Dynamic Topic Models. In: Proceedings of the23rd International Conference on Machine Learning. New York, USA: ACM,2006.113-120
    [78] Sundheim B. Overview of results of the MUC-6evaluation. In: Proceedings of the6th conference on Message understanding, Stroudsburg, PA, USA: Associationfor Computational Linguistics,1995.423-442
    [79] Yang Z, Lin H, Li Y. Exploiting the performance of dictionary-based bio-entityname recognition in biomedical literature. Computational Biology and Chemistry,2008,32(4):287-291
    [80] Li L, Zhou R, Huang D. Two-phase biomedical named entity recognition usingCRFs. Computational Biology and Chemistry,2009,33(4):334-338
    [81] You W, Fontaine D, Barthès J. An automatic keyphrase extraction system forscientific documents. Knowledge and Information Systems,2012,34(3):691-724
    [82] Shehata S, Karray F, Kamel M. An efficient concept-based retrieval model forenhancing text retrieval quality. Knowledge and Information Systems,2013,35(2):411-434
    [83] Kim S, Yoon J, Park K, et al. Two-phase biomedical named entity recognition usinga hybrid method. In: Proceedings of the Second International Joint Conference,Jeju Island, Korea, October11-13,2005.646-657
    [84] Pablo-Sánchez CD, Segura-Bedmar I, Martínez P, et al. Lightly supervisedacquisition of named entities and linguistic patterns for multilingual text mining.Knowledge and Information Systems,2012,35(1):87-109
    [85] Pérez-Catalán M, Berlanga R, Sanz I, et al. A semantic approach for the re-quirement-driven discovery of web resources in the Life Sciences. Knowledgeand Information Systems,2012,34(3):671-690
    [86] Becker KG, Barnes KC, Bright TJ, et al. The genetic association database. NatureGenetics,2004,36(5):431-432
    [87] Schriml LM, Arze C, Nadendla S, et al. Disease Ontology: a backbone for diseasesemantic integration. Nucleic Acids Res,2012,40(D1):940-946
    [88] Masseroli M, Galati O, Manzotti M, et al. Inherited disorder phenotypes: controlledannotation and statistical analysis for knowledge mining from gene lists. BMCBioinformatics,2005,6(S4):18-22
    [89] Smith CL, Goldsmith CA, Eppig JT. The Mammalian Phenotype Ontology as a toolfor annotating, analyzing and comparing phenotypic information. Genome Biol,2005,6(1):7-10
    [90] Masseroli M, Galati O, Pinciroli F. GFINDer: genetic disease and phenotypelocation statistical analysis and mining of dynamically annotated gene lists.Nucleic Acids Research,2005,33(2):717-723
    [91] Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus:the metamap program. In: Proceedings of the Annual Symposium on ComputerApplication in Medical Care. Washington DC,USA,2001.17-21
    [92] Robert Leaman, Christopher Miller, Graciela Gonzalez. Enabling Recognition ofDiseases in Biomedical Text with Machine Learning: Corpus and Benchmark. In:Proceedings of Symposium on Languages in Biology and Medicine. South Korea.2009.987-995
    [93] Aronson AR. An overview of MetaMap: historica perspective and recent advances.Journal of the American Medical Informatics Association,2010,17(3):229-236
    [94] Aronson AR. MetaMap Evaluation. Bethesda MD: National Library of Medicine,http://skr.nlm.nih.gov/papers/references/mm.evaluation.pdf,2001-4-2
    [95] Yang Li, Zhou Yanhong. Two-phase biomedical named entity recognition basedon semi-CRFs. In: Proceedings of Bio-Inspired Computing: Theories andApplications, Changsha, China,2010.1061-1065
    [96] Antonio Jimeno, Ernesto Jimenez-Ruiz, Vivian Lee, et al. Assessment of diseasenamed entity recognition on a corpus of annotated sentences. BMC Bioinformatics,2008,11(9):S3
    [97] Paul Thompson, John McNaught, Simonetta Montemagni, et al. The BioLexicon: alarge-scale terminological resource for biomedical text mining. BMCBioinformatics,2011,12(1):397
    [98] Aurelie Neveol, won Kim, W.John Wilbur, et al. Exploring Two Biomedical TextGenres for Disease Recgnition. In: Proceedings of the Workshop on BioNLP,Stroudsburg, PA, USA: Association for Computational Linguistics,2009.144-152
    [99] Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification ofbiology. Nature Genet,2000:25-29
    [100]荣毅虹,梁战平.基于文献的发现.情报学报,2002,21(4):386-390
    [101]杨志豪.面向生物医学领域的文本挖掘技术研究:[博士学位论文].保存地点:大连理工大学图书馆,2008
    [102] Gordon M D, Lindsay R K. Toward discovery support systems: a replication,re-examination, and extension of Swanson’s work on literature based discovery ofa connection between Raynaud’s and fish oil. Journal of the American Society forInformation Science,1996,47(2):116-128
    [103] Lindsay R K, Gordon M D. Literature based discovery by lexical statistics. Journalof the Ameriean Society for Information Science,1999,50(7):574-587
    [104] Srinivasan P. MeSHmap: a text mining tool for Medline. In: Proceedings of theAnnual Conference of the American Medical Informatics Association, RockvillePike, Bethesda, USA,2001.642-646
    [105] Al-Mubaid H, Singh RK. A text-minng technique for extracting gene-diseaseassociations from the biomedical literature. International Journal of BioinformaticsResearch and Applications,2010,6(3):270-286

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700