基于农业本体问句分析的问答系统研究与架构设计

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于农业本体问句分析的问答系统研究与架构设计

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Research of Question Analysis Based on Ontology and Architecture Design for Question Answering System in Agriculture
作者：胡德鹏
论文级别：博士
学科专业名称：信息技术与数字农业
中文关键词：农业信息 ; 领域本体 ; 信息抽取 ; 问句分析 ; 问答系统
英文关键词：agriculture information ; domain ontology ; Information Extract ; Question Answering system
英文关键词：(QA)
学位年度：2013
导师：王文生
学科代码：081203
学位授予单位：中国农业科学院
论文提交日期：2013-04-01
答辩委员会主席：陈良玉

摘要

近二十年以来，随着计算机技术、网络技术的在农业领域的飞速发展和广泛普及，农业领域信息技术的应用越来越受到社会关注，农业信息涉及用户覆盖面越来越广泛。农业信息化技术发展面临着新的挑战，特别是如何适应农业不同层次用户的需求，如何把农业技术通过信息技术快速、准确的传送到农业用户，成为农业信息服务建设领域面临的紧迫问题。问答系统是一个综合应用人工智能、信息检索、自然语言处理、信息抽取等技术的综合信息系统，它提供了一个简单的用户输入接口，对用户使用自然语言提出的问题，进行分析、处理，返回给用户一个简洁的答案。比较符合农业用户的需求。把问答系统应用于农业信息领域，通过对农业领域信息的检索、抽取、挖掘，可以解决农业领域技术涉及知识面广、系统结构复杂的问题，可以提高信息获取的精准度。
     本文结合问答系统的组成部分，分别对其中的若干关键问题进行了研究：
     1、本文首先对自然语言处理、信息检索、信息抽取、本体论等理论基础和发展现状给予介绍分析；结合学者们在问答系统方向的研究成果，给出问答系统的逻辑组成，然后按照系统组成分别对研究重点和难点给予分析。结合我国农业现状，分析了当前农业信息技术所面临的问题，提出了把问答系统应用到农业领域的可行性。
     2、对农业本体的构建进行了讨论，一是研究了本体中的基本概念，本体构建的规范和流程；二是重点研究了本体构建中概念、关系的抽取方法，为解决由农业叙词表转换农业本体中出现的本体关系稀疏的问题，为此本文提出了基于互信息的有监督本体关系抽取方法。
     3、对问句分析中的相关问题进行了研究，主要内容：一是引入了领域特征词的概念，用其来描述本体中的关系；二是提出了基于隐马尔可夫链的领域特征词识别抽取算法，由此实现对问句中蕴含的语义信息和领域中特征词的分析；三是研究了问题分类的方法，给出了基于本体的概念相似度计算方法，提出了基于问句特征词与问题分类特征词相似度的问题分类方法。
     4、研究基于本体的信息检索的方法，重点研究基于农业本体文档检索模型的构建方法，给出了问句与文档相关度计算方法，本文提出了构建基于领域本体的文档检索模型。
     5、答案抽取是问答系统的重要组成部分，本文提出了基于LAD的答案抽取方法，该方法主要由以下步骤：一是利用吉布斯（Gibbs）抽样进行推理,间接计算模型参数,获取词汇的概率分布,建立LDA主题模型；二是以Clarity度量块间相似性,并通过局部最小值识别片段边界，对文档进行段落分割；三是依据词汇的香农信息提取片段主题词,采取背景词汇聚类及主题词联想的方式将主题词进行扩充，形成段落主题词串；四是计算问句与段落主题词串的相似度，取相似度最高的段落为答案。
     6、研究面向农业领域的问答系统的架构设计，提出了基于云计算架构的农业问答系统的架构设计方法，系统架构中的存储系统使用开源分布式文件系统HDFS和非关系型数据库HBase；介绍分析HDFS和HBase的原理，描述了HDFS和HBase农业问答系统中的应用架构，结合上述问答系统的算法，提出了面向农业领域的问答系统逻辑构架。
     7、针对问答系统设计了实验方法，选择评价标准，主要进行了问句分析中领域特征词识别和问题分类实验，基于本体的信息检索实验和面向农业领域的答案抽取正确率的实验，每个实验都设计了数据模型，对实验结果给予分析，证明本文所提出方法的性能。
In last two decades, the telecommunication network has spread into countryside, and somepeasants have surfed the Internet with personal computers, which have come up in China. How toaccommodate the special interests of users for agriculture information, and how to accurately propagatethe agriculture technology information, have become a challenge and critical problems for informationtechnology in agriculture.
     Question Answering System (QA) is a hierarchical, comprehensive system, whose researchbranches refer to Artificial Intelligence (AI), Information Retrieval (IR), Information Extract (IE), andNational Language Processing (NLP). The approach of applying QA to satisfy requirement of users inagriculture by retrieval, extract, and mining information form Internet is a feasible solution. This thesis’main research focused on the key problems of QA. The main works in this paper are as follows:
     1. At first, this paper introduced the foundation concepts about NLP, IR, IE, and ontology et al. andgave an outline of development process of NLP, IR, IE, and ontology et al. Then, on the basis theresearches of QA system, this paper analyzed the logical structure of QA based free text, which focusedon the research methods and the basic framework of QA. The development of agricultural informationtechnology with Chinese characteristics was briefly introduced, including the application of QA systemin agriculture.
     2. This part proposed a novel semi-supervised method for domain ontology relation learning. Thekey problem was how to enrich the relations between concepts. On the base of text information analysis,this paper proposed a method for extracting ontology relation with mutual information algorithm.
     3. The semantic analysis over a question is the key to catch the user’s requirement. In this thesis, inorder to descript the relationship between concepts, this paper proposed concept-feature for thepresentation of domain-specific concepts. A novel algorithm based on hidden Markov model forextracting concept-feature words was proposed, analyzed the key to the learning of the module structureand method of parameter estimation. In the processing, the algorithm makes full use of the formatinformation of list separators and special-labels to segment text, and gains extraction information ofspecial-fields, based on hidden Markov model.
     4. IR was one main part of QA. The researches of this thesis mainly focus on the informationretrieval model. The ontology-based information retrieval model was introduced, which based on thecomputing equivalent classes of individuals of ontology. ontology was generated using a kind of basicdescription logic, which was a suitable tradeoff between expressivity of knowledge and complexity ofreasoning problems.
     5. Answering extraction is the key problem of QA. This thesis proposed an answer extractionalgorithm based Latent Dirichlet Allocation (LDA). The main methods as follows:
     Firstly, the topic-word and document-topic distributions were inference by Gibbs algorithm, andthrough which built LDA model for text. Secondly, Text segmentations were built based on LDA models corpora and texts. Clarity is taken as a metric for similarity of blocks and segmentation pointsare identified by local minimum. Thirdly, the topic words of segments are extracted according toShannon information. Words which are not distinctly in the analyzed text can be included to express thetopics with the help of word clustering of background and topic words association. The significationbehind the words are attempted to be digged out. Last, the similarity between questions and paragraphsare calculated, and take the highest similarity paragraph for the answer.
     6. The architecture of QA system was described in detail, which was built on Hadoop and HBase.The principle and the application method of open source distributed file system-Hadoop, and theNon-Relational database-HBase were introduced in this thesis. The method develops QA system basedon Hadoop and HBase was proposed. The function of each part of the QA system was presented andintegrated performance analysis of QA system was given in this part.
     7. The experimental methods and data models for QA system were designed, which include theanalisis of evaluation criteria. At first, the results of experiments for extracting concept-feature wordsand question classification were analyzed. Then recall of ontology information retrieval experimentswere described and compared with the keywords method. Last, the accuracy rate of answer extractionbased on LDA model was analzed, which mainly for the agriculture-based question calssfication. Theexperimental results demonstrate the methods proposed in this paper could enhance the performance ofQuestion Answering system in agriculture.

引文

1.2011年我国搜索引擎用户数统计分析[EB/OL].http://www.qianzhan.com/qzdata/detail/147/20120130-72049ba3ff7df1b5.html.
    2.冯志伟,自然语言处理的形式模型[M],北京:中国科学技术大学出版社,2010,2-3.
    3.全球互联网状况统计[EB/OL].http://www.elickz.com/stat/.
    4.宗成庆,统计自然语言处理[M],北京:清华大学出版社,2008,354-377.
    5.徐晋.信息检索技术鲁棒性研究[硕士学位论文].北京:中国科学院自动化研究所,2005.
    6.李国辉,汤大权,武德峰.信息组织与检索[M].北京.科学出版社,2003,1-5
    7.王进.基于本体的语义信息检索研究[博士学位论文].合肥:中国科学技术大学,2006.
    8.张素香.信息抽取中关键技术的研究[博士学位论文],北京:北京邮电技术大学,2007.
    9.张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学.2005,32(4):44-48
    10.李国臣,罗云飞.采用优先选择策略的中文人称代词的指代消解[J].中文信息学报,2005,19(4):24-30.
    11.李保利,陈玉忠,俞士汶.信息抽取研究综述,计算机工程与应用,2003,39(10):l-5.
    12.顾金睿,王芳.关于本体论的研究综述[J].情报科学,2007,25(6):949-956.
    13.李善平,尹奇韡,胡玉杰,等.本体论研究综述[J].计算机研究与发展,2004,41(7):1041-1052.
    14.杨学功.本体论哲学批判纲要-对马克思哲学变革实质的一种理解[DB/OL].http://www. siwen.Org/xingershangxue/btlzxppgy. htm,2002-06-24.
    15.杨学功.传统本体论哲学的终结[DB/OL].http://ctk. cn.gs/hwt/zxbt. html,2003-02-11.
    16.刘柏嵩,高济.基于RDF的异构信息语义集成研究[J].情报学报,2002,(6):691-695.
    17.张玉峰,李敏.动态约束性概念网络与知识检索研究[J].情报学报,2003,(3):278-281.
    18.李景.本体理论在文献检索系统中的应用研究[M].北京:北京图书馆出版社,2005:5-6.
    19.李景,钱平,苏晓鹭.构建领域本体的方法[J].计算机与农业,2003,(7):7-10.
    20.王文生1.用现代信息技术突破农业科研创新与推广瓶颈[J],中国农村科技,2012(7),27-28.
    21.王文生.德国农业信息技术研究进展与发展趋势[J].农业展望,2011,7(9):48-51.
    22.王文生2.利用3G等现代信息技术创新基层农技体系推广与管理手段[J].中国农村科技,2012(3):52-55.
    23.赵静,王玉平.国内外农业信息化研究述评[J].图书情报知识,2007,6:80-85.
    24.刘里,曾庆田.自动问答系统研究综述[J].山东科技大学学报自然科学.2007,26（4）,73-76.
    25.张亮.面向开放域的中文问答系统问句处理相关技术研[博士学位论文].南京：南京理工大学,2006.
    26.郑实福,刘挺,秦兵,等.自动问答综述[J].中文信息学报,2002,16(6):46-52.
    27.毛先领,李晓明,问答系统研究综述[J],计算机科学与探索2012,6(3),193-207.
    28.范士喜,王晓龙,王轩,张耀允.面向真实环境的问句分析方法[J].电子学报.2010,38(5).1131-1135.
    29.吴英梅,黄婧,郝永艳,国内外FAQ研究综述[J],长春工业大学学报(社会科学版),2009,21(2):113-115.
    30.徐中一.中文信息抽取中的若干问题研究[硕士学位论文],长春：吉林大学,2007.
    31.吴刚.基于主题的中文事件抽取技术研究及应用[硕士学位论文],苏州：苏州大学,2009
    32.陈叶旺,国家农业本体协同建构与语义检索若干技术研究[博士学位论文],上海：复旦大学,2009.
    33.钱平,郑业鲁.农业本体论研究与应用[M].北京:中国农业科学技术出版社,2006.
    34.鲜国建.农业科学叙词表向农业本体转化系统的研究与实现[硕士学位论文].北京：中国农业科学院.2008.
    35.马张华,侯汉清.文献分类法主题法导论[M]．北京：北京图书馆出版社,2002．
    36.马张华．信息组织（第二版）[M].北京：清华大学出版社,2003．
    37.杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,17(9):1837-1847.
    38.温春,石昭祥,张霄.本体概念层次获取方法综述.计算机应用与软件[J].2010,127(9),103-107
    39.裴炳镇,陈晓明等.一种建立中文概念分类关系的新算法[J].计算机工程与应用,2004,36:18-21.
    40.方卫东,袁华,刘卫红.基于Web挖掘的领域本体自动学习[J].清华大学学报:自然科学版,2005,45(S1):1729-1733.
    41.何婷婷,张小鹏.特定领域本体自动构造方法[J].计算机工程,2007,33(22):235-237.
    42.黄美丽,刘宗田,基于形式概念分析的领域本体构建方法研究关[J].计算机科学,2006,33(1),210-239
    43.温春,石昭祥,杨国正.一种利用度属性获取本体概念层次的方法[J].小型微型计算机系统.2010,131(2),322-326.
    44.段瑞龙,宋文.国内外叙词表转换本体方法研究综述[J].情报杂志,2012,31(7):66-71.
    45.付佳佳.基于叙词表的领域本体建模研究[硕士学位论文]．上海:华东师范大学,2006
    46.仓定兰.基于叙词表的领域本体半自动构建的研究和实现[J].科学技术与工程,2009,24(9):7588-7593.
    47.曾新红,明仲．中文叙词表本体共建共享系统研究[J]．情报学报,2008,27(3):386-394
    48.曾新红．中文叙词表本体-叙词表与本体的融合[J]．情报学报,2008(3):34-43
    49. C. E. SHANNON. A Mathematical Theory of Communication. The Bell System Technical Journal,1948(27),379–423,623–656.
    50.张巍.融合FAQ本体和推理技术的问答系统研究[D].博士毕业论文.2011,太原理工大学.
    51.向春丞,穗志方.基于领域本体的中文问答系统问句分析研究[C].第六届全国青年计算语言学会议论文集.2012,224-228.
    52.张亮,王树梅,黄河燕等.面向中文问答系统的问句句法分析[J].山东大学学报(理学版),2006,41(3):30-33.
    53.许莉,中文问答系统中问题分析关键技术的研究[硕士学位论文],2008,沈阳：东北大学.A. P. COWIE,霍恩比.牛津现代高级英汉双解词典(简化汉字本)(第一版)[M].北京：商务印书馆,1993:786.
    54.李鑫,黄萱菁,吴立德．基于错误驱动算法组合分类器及其在问题分类中的应用[J]．计算机研究与发展,2008,45(3)：535-541．
    55.徐延勇,周献中,井祥鹤,郭忠伟．基于最大熵模型的汉语句子分析[J]．电子学报,2003,31(11)：1606-1612．
    56.孙昂,江铭虎,等．基于句法分析和答案分类的中文问答系统[J]．电子学报,2008,36(5):833-839．
    57.孔德镛.基于本体技术的旅游信息语义查询系统研究[D].硕士学位论文,2010,西北大学
    58.杨瑾.领域本体的构建研究-以”数据结构”为例[J].电脑知识与技术.2012,8(4),967-970.
    59.刘杰,樊孝忠,王涛.基于本体的受限领域问答系统研究[J].广西师范大学学报(自然科学版).2009,2(l),169-172.
    60.龚光鲁,钱敏平．应用随机过程教程[M]．北京：清华大学出版社,2004,245-249.
    61.尹宝才,李敬华,贾熹滨等．基于两层隐马尔可夫模型的可视语音合成[J]．北京工业大学学报,2006,32(5)：416-418．
    62.李和平,胡占义,吴毅红,吴福朝.基于半监督学习的行为建模与异常检测.软件学报[J],2007,18(3):527-537.
    63.陶龙明,史志才,彭丹,马武.HMM模型在检测复杂网络攻击中的应用[J].计算机工程与应用,2008,44(7):136-138.
    64.于江德,樊孝忠,尹继豪．隐马尔可夫模型在自然语音处理中的应用[J],计算机工程与应用,2007,28(22)：5514-5516.
    65.王宇宁.隐马尔可夫模型在信息抽取中的应用研究.[硕士学位论文].大连:大连理工大学,2007.
    66.朱明,郭春生.隐马尔可夫模型及其最新应用与发展[J].计算机系统应用,2010,Vol19（7）:255-259
    67.周顺先1,林亚平,王耀南,易叶青.基于二阶隐马尔可夫模型的文本信息抽取[J].电子学报,2007,35(11):2226-2231.
    68.周顺先2,林亚平,王耀南,易叶青.基于聚簇隐马尔可夫模型的文本信息抽取[J].系统仿真学报,2007,19（21）：4926-2931.
    69.潘鹏,诸云强,朱琦,赵晓宏,隐马尔可夫模型在环保档案信息抽取中的应用[J].计算机工程与应用,2012,48（26）:243-247
    70.史西兵,王浩鸣.隐马尔可夫模型解决信息抽取问题的仿真研究[J].计算机仿真,2010,Vol27(5):132-135.
    71.文勖,张宇,刘挺等．基于句法结构分析的中文问题分类[J]．中文信息学报,2006,20(2):33-39．
    72.余正涛,樊孝忠,郭剑毅．基于支持向量机的汉语问句分类[J]．华南理工大学学报：自然科学版,2005,33(9)125-29．
    73.孙景广,蔡东风,吕德新等．基于知网的中文问题自动分类[J]．中文信息学报,2007,21(1)：90-96．
    74.朱成文,李兵,胡奎.HMM参数估计的Gibbs抽样算法.计算机工程与应用[J],2012,48（18）,57-60.
    75.潘志安.融入本体的问题特征模型在中文问题分类中的研究[硕士学位论文].太原理工大学.2011
    76.李荣,杨冬,刘磊.基于本体的概念相似度计算方法研究[J].计算机研究与发展.2011,48(增刊)：312-317.
    77.张晓林,李宇.描述知识组织体系的元数据[J].图书情报工作,2002,(2):64-69.
    78.宋峻峰,张维明,肖卫东,唐九阳.基于本体的信息检索模型研究[J].南京大学学报（自然科学版）,2005,41（2）,189-197.
    79.张明宝,马静,施秀丽.领域本体在信息检索中的应用研究[J].情报学报.2010,29（2）,215-222.
    80.李志国,钟将,冯永,叶春晓.基于知识本体的文本分类技术及其应用研究[J].计算机科学2007Vol134№18,184-186.
    81.黄波,中文问答系统中答案抽取的研究与实践[硕士学位论文],吉林大学.2010.
    82.邓昱,中文问答系统中答案抽取算法研究[硕士学位论文],北京邮电大学.2009.
    83.胡宝顺,王大玲,于戈,马婷.基于句法结构特征分析及分类技术的答案提取算法.[J].计算机学报,2008,31(4):662-676.
    84.徐戈,王厚峰,自然语言处理中主题模型的发展[J].计算机学报,2011,34(8),1423-1436
    85.宋志理,基于LDA模型的文本分类[硕士学位论文].西安：西安理工大学.2010.
    86.孙昌年,基于主题模型的文本相似度计算研究与实现[硕士学位论文],2012,安徽大学.
    87.石晶,范猛,李万龙,基于LDA模型的主题分析[J].自动化学报,vol35(12),2009.
    88.石晶,胡明,石鑫钟,戴国忠.基于LDA模型的文本分割[J].计算机学报.2008,31(10),1865-1873.
    89.向小军,高阳,商琳,杨育彬.基于Hadoop平台的海量文本分类的并行化[J].计算机科学.138(10),2011.
    90.高灵旺,陈继光,于新文,王春荣,胡伯海.农业病虫害预测预报专家系统平台的开发,农业工程学报,2006,22（10）,154-158.
    91.杨永贵.中文信息抽取关键技术研究与实现[硕士论文].北京：北京邮电大学,2008.
    92.孟令谦,基于ontology的中文信息抽取系统的研究与实现[硕士论文],成都：电子科技大学,2004.
    93.袁璐,蒙祖强,许珂.依存分析和HMM相结合的信息抽取方法[J].计算机工程与应用,2012,48(9):138-140.
    94.宋峻峰,李国辉.信息检索算法评价指标的分析与改进.小型微型计算机系统,2003,24(10):1800-1803
    95.张宇,刘挺,文勖．基于改进贝叶斯模型的问题分类[J]．中文信息学报,2005,19(2)：100-105．
    96.张志昌,张宇,刘挺等．基于线索词识别和训练集扩展的中文问题分类[J]．高技术通讯．2009,19(2)：111-118．
    97.英国莫里(MORD调查公司[EB/OL].http://www.ipsos-mori.com/.
    98. Extracting Value from Chaos [EB/OL]. IDC View,EMC公司（2011年6月）.http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
    99. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to InformationRetrieval[M]. Cambridge Univ Press,2008.105-127.
    100. Turing A M. Computing machinery and intelligence [J]. Mind,1950,59(236):433-460.
    101. Green Jr B F, Wolf A K, Chomsky C, et al. Baseball: an automatic question-answerer[C]//Paperspresented at the May9-11,1961, western joint IRE-AIEE-ACM computer conference. ACM,1961:219-224.
    102. Woods W A. Lunar rocks in natural English: explorations in natural language question answering[J].Linguistic Structures Processing,1977,5:521569.
    103. Hirschman L, Gaizauskas R. Natural language question answering: the view from here[J]. NaturalLanguage Engineering,2001,7(4):275300.
    104. Singhal A. Modern information retrieval: A brief overview[J]. IEEE Data Engineering Bulletin,2001,24(4):35-43.
    105. Bush V. As We May Think. Atlantic Monthly[J],176:101–108, July1945.
    106. Salton G. The SMART retrieval system-experiments in automatic document processing[J].1971.
    107. C. Cleverdon, J. Mills, and M. Keen. Aslib Cranfield research project-factors determining theperformance of indexing systems; volume1, design; part1, text. Technical report, Cranfield University,
    1966. URI: http://hdl.handle.net/1826/861.
    108. BAEZA-YATES R. Modern information retrieval [J].2010.87-132.
    109. Baeza-Yates,R.A.&Ribeiro-Neto,B.A.(1999). Modern Information Retrieval[M]. ACM PressAddison-Wesley. ISBNO-201-39829-X.153-172.
    110. Lin J, Demner-Fushman D. Will pyramids built of nuggets topple over?[C]//Proceedings of the MainConference on Human Language Technology Conference of the North American Chapter of theAssociation of Computational Linguistics (HLT-NAACL’06), Morristown, NJ,USA,2006.Stroudsburg, PA, USA: Association for Computational Linguistics,2006:383–390.
    111. Lin J, Demner-Fushman D. Automatically evaluating answers to definition questions[C]//Proceedingsof the Conference on Human Language Technology and Empirical Methods in Natural LanguageProcessing (HLT’05). Stroudsburg, PA, USA: Association for Computational Linguistics,2005:931938.
    112. Gaizauskas R, Wilks Y. Information extraction: Beyond document retrieval[J]. Journal ofdocumentation,1998,54(1):70-105.
    113. Dejong G, An Overview of the FRUMP System. In: LEHNERT, W.,&RINGLE, M.h.(eds), Strategiesfor Natural Language Processing. Lawrence Erlbaum,1982,149-176.
    114. Aberdeen J,Day D,Hirsehman F,Robinson P,et. Description of the Alembic system used forMUC-6[A].MUC-6[C],1995:141-155.
    115. Sun J, Gao J, Zhang L, et al. Chinese named entity identification using class-based languagemodel[C]//Proceedings of the19th international conference on Computational linguistics-Volume1.Association for Computational Linguistics,2002:1-7.
    116. Zhou G D, Su J. Named entity recognition using an HMM-based chunk tagger[C]//proceedings of the40th Annual Meeting on Association for Computational Linguistics. Association for ComputationalLinguistics,2002:473-480.
    117. K. A. Heller, Y. W. Teh, and D. G¨or¨ur. Infinite hierarchical hidden Markov models[J]. In Proceedingsof the International Conference on Artificial Intelligence and Statistics, volume12,2009.
    118. Ratnaparkhi A. A simple introduction to maximum entropy models for natural language processing [J].IRCS Technical Reports Series,1997:81.
    119. Sekine S, Grishman R, Shinnou H. A decision tree method for finding and classifying names inJapanese texts[C]//Proceedings of the Sixth Workshop on Very Large Corpora.1998.
    120. Hobbs J R. The generic information extraction system[C]//Proceedings of the Fifth MessageUnderstanding Conference (MUC-5).1993:87-91.
    121. Yangarber R, Grishman R. NYU: Description of the Proteus/PET system as used for MUC-7[C]//InProceedings of the Seventh Message Understanding Conference (MUC-7.1998.
    122. Hayes-Roth F, Lenat D B. Building expert systems [M]. Addison Wesley Publishing Company,1983.
    123. S Mark, L Conway. Towards the principled engineering of knowledge. AI Magazine,1982,3(3):4-16.
    124. Jardine D A. The ANSI/SPARC DBMS Model; Proceedings of the Second Share Working Conferenceon Data Base Management Systems, Montreal, Canada, April26-30,1976[M]. Elsevier Science Inc.,
    1977.
    125. Iscoe N, Williams G B, Arango G. Domain modeling for software engineering[C]//SoftwareEngineering,1991. Proceedings.,13th International Conference on. IEEE,1991:340-343.
    126. J McCarthy. Circumscription-A form of non2monotonic reasoning. Artificial Intelligence,1980,5(13):27-39
    127. Quine W V, Quine W V O. Ontological relativity and other essays [M]. Columbia University Press,
    1969.
    128. Sowa J F. Conceptual structures: information processing in mind and machine [J].1983.
    129. Gruber T R. A translation approach to portable ontology specifications [J]. Knowledge acquisition,1993,5(2):199-220..
    130. Guarino N. Formal Ontology in Information Systems: Proceedings of the1st International ConferenceJune6-8,1998, Trento, Italy[M]. Ios PressInc,1998.3-15.
    131. Uschold M, Gruninger M. Ontologies: Principles, methods and applications [J]. Knowledgeengineering review,1996,11(2):93-136.
    132. Studer Rudi, Richard Benjamins and Dieter Fensel. Knowledge Engineering: Principles and Methods[J]. Data and Knowledge Engineering. Vol.25,1998(1-2):161-197.
    133. Molla D, Vicedo J L. Question answering in restricted domains: an overview[J]. ComputationalLinguistics,2007,33(1):4161.
    134. Salton&Buckley,1998,：Salton G,Buckley B.1998. Term-weighting Approaches in Automatic TextRetrieval [J]. Information Processing and Management,24(5)：513-523.
    135. Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for TextCategorization[C]//Proceedings of the Fourteenth International Conference on Machine Learning.Morgan Kaufmann Publishers Inc.,1997:143-151.
    136. Robertson S. Understanding inverse document frequency: on theoretical arguments for IDF[J]. Journalof documentation,2004,60(5):503-520.
    137. Simmons R F. Answering English questions by computer: a survey [J]. Communications of the ACM,1965,8(1):5370.
    138. Sager N, Natural Language Information Processing[M], Reading, Massachusetts: Addison Wesley,
    1981.
    139. Sheng-Yuan Yang．Developing of an ontological interface agent with template based linguisticprocessing technique for FAQ services[J/OL]． Expert System with Applications,2008-04-03/2008-05-09．
    140. Winiwarter W. Adaptive natural language interfaces to FAQ knowledge bases[J]. Data&KnowledgeEngineering,2000,35(2):181-199．
    141. Farquhar A, Fikes R, Rice J. The ontolingua server: A tool for collaborative ontology construction[J].International journal of human-computer studies,1997,46(6):707-727.
    142. Swartout B, Patil R, Knight K, et al. Toward distributed use of large-scale ontologies[C]//Proc. of theTenth Workshop on Knowledge Acquisition for Knowledge-Based Systems.1996.
    143. Duineveld A J, Stoter R, Weiden M R, et al. WonderTools? A comparative study of ontologicalengineering tools[J]. International Journal of Human-Computer Studies,2000,52(6):1111-1133.
    144. Noy N F, Fergerson R W, Musen M A. The knowledge model of Protege-2000: Combininginteroperability and flexibility[M]//Knowledge Engineering and Knowledge Management Methods,Models, and Tools. Springer Berlin Heidelberg,2000:17-32.
    145. Arpírez J C, Corcho O, Fernández-López M, et al. WebODE: a scalable workbench for ontologicalengineering[C]//Proceedings of the1st international conference on Knowledge capture. ACM,2001:6-13
    146. Bechhofer S, Horrocks I, Goble C, et al. OilEd: a reason-able ontology editor for the semanticweb[M]//KI2001: Advances in Artificial Intelligence. Springer Berlin Heidelberg,2001:396-408..
    147. Bozsak E, Ehrig M, Handschuh S, et al. KAON-Towards a large scale semantic web[M]//E-Commerceand Web Technologies. Springer Berlin Heidelberg,2002:304-313.
    148. Lenat D B. CYC: A large-scale investment in knowledge infrastructure[J]. Communications of theACM,1995,38(11):33-38.
    149. Nirenburg S, Beale S, Mahesh K, et al. Lexicons in the Mikrokosmos project[C]//Proceedings of theSociety for Artificial Intelligence and Simulated Behavior Workshop on Multilinguality in the Lexicon,Brighton, UK.1996.226-233.
    150. Ramanathan S, Hodges J. Reverse engineering relational schemas to object-oriented schemas[J].Techical Report No. MSU-960701, July,1996,1.
    151. Stojanovic L, Stojanovic N, Volz R. Migrating data-intensive web sites into the semanticweb[C]//Proceedings of the2002ACM symposium on Applied computing. ACM,2002:1100-1107.
    152. Astrova I. Reverse engineering of relational databases to ontologies[M]//The Semantic Web: Researchand Applications. Springer Berlin Heidelberg,2004:327-341.
    153. Han J, Kamber M, write. Fan M, Meng XF, et al, translate. Data Mining: Concepts and Techniques[M].Beijing: China Machine Press,2001(in Chinese).
    154. Astrova I, Stantic B. An HTML forms driven approach to reverse engineering of relational databases toontologies[C]//proceeding of the23rd IASTED International Conference on Databases andApplications (DBA), eds. MH Hamza, Innsbruck, Austria.2005:246-251.
    155. Lawrence S, Giles C L. Searching the world wide web[J]. Science,1998,280(5360):98-100.
    156. Shamsfard M, Barforoush A A. Learning ontologies from natural language texts[J]. InternationalJournal of Human-Computer Studies,2004,60(1):17-63.
    157. Agirre E, Ansa O, Hovy E, et al. Enriching very large ontologies using the WWW[J]. arXiv preprintcs/0010026,2000.
    158. Missikoff M, Navigli R, Velardi P. Integrated approach to web ontology learning and engineering[J].Computer,2002,35(11):60-63.
    159. Navigli R, Velardi P, Gangemi A. Ontology learning and its application to automated terminologytranslation[J]. Intelligent Systems, IEEE,2003,18(1):22-31.
    160. Daille B. Study and implementation of combined techniques for automatic extraction of terminology[J].The balancing act: Combining symbolic and statistical approaches to language,1996,1:49-66.
    161. Papatheodorou C, Vassiliou A, Simon B. Discovery of ontologies for learning resources usingword-based clustering[C]//World Conference on Educational Multimedia, Hypermedia andTelecommunications.2002,2002(1):1523-1528.
    162. Kavalec M, Svaték V. A study on automated relation labelling in ontology learning[J]. OntologyLearning from Text: Methods, evaluation and applications,2005(123):44-58.
    163. Doan A H, Domingos P, Levy A. Learning source descriptions for data integration[C]//WebDB(Informal Proceedings).2000:81-86.
    164. dos Santos Mello R, Heuser C A. A bottom-up approach for integration of XML sources[C]//Workshopon Information Integration on the Web.2001:118-124.
    165. Litkowski K C. Models of the semantic structure of dictionaries[J]. American Journal of ComputationalLinguistics, Mf,1978,81:25-74.
    166. Rigau G, Rodriguez H, Agirre E. Building accurate semantic taxonomies from monolingualMRDs[C]//Proceedings of the17th international conference on Computational linguistics-Volume2.Association for Computational Linguistics,1998:1103-1109.
    167. Chen W L, Zhu J B, Yao T S, et al. Automatic learning field words by bootstrapping[C]//Proc. of theJSCL. Beijing: Tsinghua University Press.2003,6772(in Chinese with English abstract).
    168. Zheng J H, Lu J L. Study of an improved keywords distillation method[J]. Computer Engineering,2005,31(18):194-196.(in Chinese with English abstract).
    169. Du B, Tian H F, Wang L, et al. Design of domain-specific term extractor based on multi-strategy[J].Computer Engineering,2005,31(14):159-160.(in Chinese with English abstract).
    170. Hearst M A. Automatic acquisition of hyponyms from large text corpora[C]//Proceedings of the14thconference on Computational linguistics-Volume2. Association for Computational Linguistics,1992:539-545.
    171. Bisson G. Learning in FOL with a similarity measure[C]//PROCEEDINGS OF THE NATIONALCONFERENCE ON ARTIFICIAL INTELLIGENCE. JOHN WILEY&SONS LTD,1992:82-82.
    172. Emde W, Wettschereck D. Relational instance-based learning[C]//MACHINELEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE-. MORGAN KAUFMANNPUBLISHERS, INC.,1996:122-130.
    173. Faure D, Nédellec C. A corpus-based conceptual clustering method for verb frames and ontologyacquisition[C]//LREC workshop on adapting lexical and corpus resources to sublanguages andapplications.1998:707-728.
    174. Maedche A, Staab S. Discovering conceptual relations from text[C]//Ecai.2000:321-325.
    175. Maedche A, Staab S. Ontology learning for the semantic web[J]. Intelligent Systems, IEEE,2001,16(2):72-79.
    176. Dr phil,Freiburg. Building a Rich Ontology from AGROVOC [EB/OL].[2004-04-27/2012-03-20].http://www.dsoergel.com
    177. U Hahn．Turning Informal Thesauri Into Formal Ontologies: a Feasibility Study on BiomedicalKnowledge Reuse[J]．Comparative and Functional Genomics,2003(4):94-97
    178. U Hahn,S Schulz．Towards a Broad coverage Biomedical Ontology Based on Description Logics[J].PacSymp Biocomput.2003(8):577-588.
    179. B J Wielinga,A Th Schreiber,J Wielemaker, et al． From Thesaurus toOntology[A]//Y． Gil,M． Musen,J． Shavlik． Proceedings1st International Conference onKnowledge Capture [C].Victoria,Canada: ACM Press,2001:194-201
    180. Hyvonen E,Viljanen K,Tuominen J, et al． Building a National Semantic Web Ontology and OntologyService Infrastructure the FinnONTO Approach[A]//Bechhofer et al． In Proceedings of the5thEuropean Semantic Web Conference[C]. European: Springer,2008:95-109
    181. Sin-Jae Kang, Jong-Hyeok Lee． Semi-Automatic Practical Ontology Construction by Using aThesaurus,Computational Dictionaries,and Large Corpora[J]. Div. of Electrical and ComputerEngineering,2011(6):784-790.
    182. M． van Assem,V． Malaisé,A Miles, et al． A Method to Convert Thesauri to SKOS[J]． In TheSemantic Web: Research and Applications．2006:95-109.
    183. Noriko Tomuro. Interrogative reformulation patterns and acquisition of question paraphrases[A]． Proceeding of the Second International Workshop on paraphrasing [C]． Sapporo,Japan:Association for Computational Linguistics．2003．33-40．
    184. Ulf Hermjakot．Parsing and question classification for question answering[A] Proceedings of the ACLWorkshop on Open-Domain Question Answering[C]．Toulouse,France: Association for ComputationalLinguistics．2001．19-25．
    185. ShiXi Fan,Xuan Wang,Xiaolong Wang．Combination of rough set theory and maximum entropy modelfor conjunctive structure detection in QA system[A]．Proceeding of the Sixth International Conferenceon Machine learning and Cybemetics[C]．Hong Kong: IEEE2007,3051-3056.
    186. Huizhong Duan,Yunbo Cao,Chin-Yew Lin,Yong Yu．Searching questions by idenlifying question topicand question focus[A]．In Proceedings of ACL-2008：HLT[C]．Columbus,Ohio,USA：Association forComputational Linguistics,2008.15-20.
    187. Cui Hang, Kan M-Y, Chua T-S. Unsupervised learning of soft patterns for generating definitions fromonline news[C]//Feldman S I, Uretsky M, Najork M, et al. Proceedings of the13th InternationalConference on World Wide Web (WWW2004), May1720,2004. New York, NY, USA: ACM,2004:90–99.
    188. Samaria F,Young S． HMM based architecture for face identification[J]． Image and computervision,l994,l2(8):537-583．
    189. Nefian A V, Hayes M H．Face detection and recognition using Hidden Markov Models [A].Proceedings of the International Conference on Image Processing[C], l998,141-145．
    190. Bregler C. Learning and recognizing human dynamics in video sequences. Proceedings of IEEEConference on Computer Vision and Pattern Recognition. Los Alamitos,CA: IEEE,1997:568-574.
    191. Lawrence E Rabiner. A tutorial on hidden Markov models and selected application in speechrecognition[C]. Proceedings of the IEEE,1989,77(2):257-286.
    192. Moldovan D,Pasca M,Harahagiu S,et a1．Performance issues and error analysis in an open-domainquestion answering system[J]．ACM Transactions On Information Systems,2003,21(2)：133-154．
    193. Li Xu, Roth D．Learning question classifiers：the role of semantic information[J]．Journal of NaturalLanguage Engineering,2006,12(3)：229-250．
    194. Zhang D,Lee W．Question classication using support vector machines [C], Proceedings of the26thAnnual International. ACM SIGIR Conference on Research and Development in InformationRetrieval(SIGIR2003)．Toronto,Canada：ACM,2003：26-32．
    195. Huang Zhiheng,Thint M,Qin Zengchang． Question classification using head words and theirhypernyms[C]∥Proceedings of the2008Conference on Empirical Methods in Natural LanguageProcessing(EMNLP)．Honolulu：Association for Computational Linguistics,2008：927-936．
    196. Huang Zhiheng,Thint M,Celikyilmaz A． Investigation of question classifier in questionanswering[C]//Proceedings of the2010Conference on Empirical Methods in Natural LanguageProcessing(EMNLP)．Singapore：Association for Computational Linguistics.2010：543-550．
    197. Li Fangtao,Zhang Xian,Yuan Jinhui,et a1． Classifying what-type questions by head nountagging[c]∥Proceedings of the22nd International Conference on ComputationalLinguistics(COLING)．Manchester：Association for Computational Linguistics.2008,481-488．
    198. Yu Zhengtao, Su Lei,Li Lim, et a1．Question classification based on co-training style semi-supervisedlearning[J]．Pattern Recognition Letters,2010,31:1975-1980．
    199. Kim S B. Some Effective Techniques for NaiveBayes T ext Classification [J]. IEEE Transactions onKnowledge and Data Engineering,2011,18(11):1457-1466.
    200. Vapnik V. The Nature of Statistical Learning Theory [M]. NewYork: Springer,2000.
    201. Bell D A, Guan J W, Bi Y. On combining classifier mass functions for text categorization[J].Knowledge and Data Engineering, IEEE Transactions on,2005,17(10):1307-1319.
    202. Li Xin, Roth D. Learning question classifiers[C]//Proceedings of the19th International Conference onComputational Linguistics (COLING’02). Stroudsburg, PA, USA: Association for ComputationalLinguistics,2002:17.
    203. Zhang D, Lee W S. Question classification using support vector machines[C]//Proceedings of the26thAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’03). New York, NY, USA: ACM,2003:26–32.
    204. Meng Xiangyan, Yu Zhengtao, Zhang Cheng, Li Yingwei; Guo Jianyi; Mao Cunli. Restricted-domainQuestion Classification with the Help of Syntactic Structures Relationships and Domain Characteristic.Journal of Computational Information System,2008.4(4):1603-1609.
    205. Budanitsky A, Hirst G. Evaluating WordNet-based measures of lexical semantic relatedness[J].Computational Linguistics,2006,32(1):13-47.
    206. Fleischman M, Hovy E. Muilti-Document person name resolution [C]//Harabagiu S, Farwell D, eds.Proceeding of the Workshop on Reference Resolution and its Applications. Barcelona, Spain, July2004:1-8.
    207. Sussna M. Word sense disambiguation for free-text indexing using a massive semanticnetwork[C]//proceedings of the Second International Conference on Information and KnowledgeManagement(CIKM-93).Arlington,Virginia,1993:67-74.
    208. Hassan H, Hassan A, Eman O. Unsupervised information extraction approach using graph mutualreinforcement[C]//Proceeding of the Conference on Empirical Methods in Natural Language Processing.Sydney, Australia, July2006:501-508.
    209. Corley C, Mihalcea R. Measuring the semantic similarity of texts[C].//Proceeding of the ACLWorkshop on Empirical Equivalence and Entailment. Ann Arbor, MI,US, June2005:13-18.
    210. Yang D, Powers D M W. Measuring semantic similarity in the taxonomy of WordNet[C]//Proceedingof the28thAustralasian Computer Science Conference. Newcasetle, Australia,Jan/Feb2004:315-322.
    211. Marco A A, SeungJin L. A Grapth Modeling of Semantic Similarity between Words[C]//InternationalConference on Semantic Computing(ICSC2007).2007:355-362.
    212. Rada R,Mili H, Bicknell E, et al. Development and application of a metric on semantic nets [J]. IEEETransactions on Systems, Man and Cybernetics,1989,19(1):17-30.
    213. Richardson R,Smeaton A F. Using wordnet in a Knowledage-Based Approach to Information Retrieval[Z].Working Paper, CA-0395.School of Computer Applications, Dublin City University, Ireland,1995.
    214. Wu Z, Palmer M. Verbs semantics and lexical selection [C]//Proceeding of the32ndAnnual Meeting ofthe Association for Computational Linguistics. Morristown, NJ, USA,1994:133-138.
    215. Leacock C, Chodorow M. combining local context and wordnet similarity for word senseindentification [C]//Fellbaum C, ed. WordNet: an Electronic Lexical Database. MIT Press,1998:265-283.
    216. Lord P W, Stevens R D, Brass A, et al. Inversting Semantic Similarity Measures across the GensOnotology: The Relations hip between sequence and Annotation [J].Bioinformatics,2003,19(10):1275-1283.
    217. Resnik P. Semantic similarity in a taxonomy: An information based measure and its application toproblems of ambiguity in natural language[J]. Journal of Artifical Intelligence Research,1999,11:95-130.
    218. Tversky A. Features of Similarity [J].Psychological Review,1977,84(4):327-352.
    219. Banerjee S, Psdersen T. Extended gloss overlaps as a measure of semantic relatedness [C]//Proceedingof IJCAI. Mexico2003:805-810.
    220. Patwardhan S, Pedersen T. Using WordNet-based Context Vectors to Estimate the SemanticRelatedness of Concepts [C]//Proceeding of the EACL Workshop on Making Sense: BringingComputational Linguistics and Psycholinguistics Together. Trento, Italy, April2006:1-8.
    221. Wan S, Angryk R A. Measuring semantic similarity using wordnet-based context vectors [C]//Systems,Man and Cybernetica.2011:908-913.
    222. Anna F. Concept similarity by evaluating information contents and feature vectors: a combinedapproach [J]. Communications of the ACM,2009,52(3):145-149.
    223. Zhao Zhong-cheng, Yan Jian-zhon, Fang Li-ying, et al. Measuring Semantic Similarity Based onwordnet [C]//Web information system and application conference.2009,89-92.
    224. Cai Soong-mai, Lu Zhao. An Improved Semantic Similarity Measure for Word Pairs [C]//InternationalConference on e-Education,e-Business,e-Management and e-Learning.2010:212-216.
    225. Qin Peng,Lu Zhao, Yan Yu, et al. A New Measure of word semantic similaeity based on WordNethierarchy and DAG theory [C]//International Conference on Web information systems and Mining.2010:181-185.
    226. Sheng Yan, Li Yun, Luan Luan. A Concept similarity method in structural and semantic levels [C]//Second international symposium on information science and engineering:620-623.
    227. Shi Bin,Fang Li-ying,Yan Jian-zhou, et al. Ontology-based measure of semantic similarity betweenconcepts [C]//World Congress on software engineering.2010,2:109-112.
    228. Gerasimos S, Georgios S, Andreas S. A hybrid Web-based measure for computing semantic relatednessbetween words [C]//200921stIEEE International Conference on Tools with Artificial Intelligence,ICTAI.2009:441-448.
    229. Strube M, Ponzetto S. Wiki Relate Computing semantic relatedness using Wikipedia [C]//Proceedingof AAAI.2006.
    230. Gabrilovich E, Markovitch S. Computing semantic relatedness using Wikipedia-based explicit semanticanalysis [C]//IJCAI.2007:1606-1611.
    231. Milne D. Computing semantic relaedness using Wikipedia link structure [C]//NZSRSC’07.2007.
    232. Ehrig M，Staab S．QOM-quick ontology mapping//LNCS3298：Proc of the4th Int Semantic WebConf．Berlin：Springer,2004：683-697.
    233. Rodrlguez A M,Egenhofer M J．Determining semantic similarity among entity classes from differentontologies．IEEE Trans on Knowledge and Data Engineering,2003,15(2)：442-456.
    234. Frakes WB, Baeza-Yates R. Information retrieval: data structures and algorithms. Pretice Hall,EnglewoodCliffs,NJ,USA.1992.
    235. H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBMJournal of Research and Development1(4):309-317,1957.
    236. H.P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development2(2):159-165,317,1958.
    237. Ponte J, CroftWB. Alanguage modeling approach to information retrieval[C].In Proceedings of the2lstannual international ACM-SIGIR Conference on research and development in information retrieval,1998,275-281.
    238. Rosenfeld R.2000. Two decades of statistical language modeling: where do we go from here[C]//InProceedings of the IEEE,88(8),2000.
    239. Ren H, Ji D, He Y, et al. Multi-Strategy Question Answering System for NTCIR-7CCTask[C]//Proceedings of the Seventh NTCIR Workshop Meeting.2008.
    240. Yongzheng WU,Wenliang Chen, Hideki Kashioka. NiCT/ART in NTCIR-7C CCLQA TrackAnswering Complex Cross-language Questions Proceedings of NTCIR-7Workshop Meeting,2008
    241. M.E.J Newman. The structure and function of complex networks. SIAM Review,45(2):167-256,2003.
    242. Dumais S T. Using LSl for information Retrieval,Information Filtering,and other Things[C]. Proe.ofTalk at Cognitive Technology WorkshoP.1997:4-5.
    243. Hofmann T. Probabilistic latent semantic indexing[C]//Proceedings of the22nd annual internationalACM SIGIR conference on Research and development in information retrieval. ACM,1999:50-57.
    244. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation.Journal of Machine Learning Research,3:993–1022,2003.
    245. Xu F, Kurz D, Piskorski J, et al. A domain adaptive approach to automatic acquisition of domainrelevant terms and their relations with bootstrapping[C]//Proc. of LREC.2002.
    246. Griffiths T L, Steyvers M. Finding scientific topics [J]. Proceedings of the National academy ofSciences of the United States of America,2011,101(Suppl1):5228-5235.
    247. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[J]. Communications ofthe ACM,2010,51(1):107-113.
    248. Ghemawat S, Gobioff H, Leung S T. The Google file system[C]//ACM SIGOPS Operating SystemsReview. ACM,2003,37(5):29-43.
    249. Chang F, Dean J, Ghemawat S, et al. Bigtable: A distributed storage system for structured data[J]. ACMTransactions on Computer Systems (TOCS),2011,26(2):4.
    250. DeCandia G, Hastorun D, Jampani M, et al. Dynamo: amazon's highly available key-valuestore[C]//ACM Symposium on Operating Systems Principles: Proceedings of twenty-first ACMSIGOPS symposium on Operating systems principles.2007,14(17):205-220.
    251. Zhao S, Grishman R. Extracting relations with integrated information using kernelmethods[C]//Proceedings of the43rd Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,2005:419-426.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700