用户名: 密码: 验证码:
受限域中文问答系统问句分析研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
问答系统是新一代智能搜索引擎,它允许用户以自然语言提问,并能够向用户返回准确的答案。与传统的搜索引擎相比,问答系统能更好的满足用户的查询要求,更准确地检索出用户所需要的答案。问句分析是问答系统的一个非常重要的组成部分,它的准确性直接影响到最终答案抽取的准确性。本文主要利用自然语言处理技术,对受限域中领域知识库的构建,问句表征,问句分类,问句相似度计算等问答系统关键技术进行探讨与研究,并在此基础上实现了一个云南旅游FAQ库的问答系统。具体说来,本文主要取得了以下几个较有特色的成果:
     (1)针对“知网”常识库缺乏对领域概念的描述问题,提出了一种领域知识的表示、领域本体的提取与构建方法。该方法借助本体论的思想,采用“知网”的概念描述语言对领域术语概念进行精确描述,从而建立领域知识库,并实现领域知识库与常识库的融合。
     (2)提出了一种问句信息的形式化表示(问句表征)方法。该方法利用词法、语义分析实现领域问句关键字的提取与扩展,利用问句句法依存分析提取问句的句法依存树,通过问句类型与问点及答案类型映射规则来获取问句的问点及答案类型。
     (3)提出了一种基于规则和统计学习相结合的领域问句分类的方法。该方法首先利用语言规则与领域知识的特点提取问句类别规则;然后通过提取句法结构关系和领域特征,并采用改进贝叶斯分类学习算法,构造问句分类模型;最终结合规则的方法和统计学习的方法,实现领域问句分类,实验结果表明,该方法具有较好的效果。
     (4)针对当前问句相似度计算方法的不足,结合领域内汉语问句的特点,提出了一种领域问句相似度的计算方法。该方法以领域知识库及常识库为基础,计算词语之间语义相似度,提取问句句法依存对,并计算问句中依存对之间的相似度,从而实现了融合词法、句法、语义及领域知识的领域问句的相似度计算。实验结果表明,该方法具有较好的效果。
     (5)利用上述研究成果,并以云南旅游领域为例,收集整理领域特征,实现云南旅游FAQ库的问答原型系统。
Question answering system, as the new generation of intelligent search engine, allows users to ask questions by means of natural language, and can supplly more accurate answers compared with traditional search engines. Question analysis is a very important component of question answering system. The accuracy of question analysis directly affects the accuracy of the ultimate answer extraction. In the paper, Yunnan Tourism FAQ question answering system model is constructed based on natural language processing technology, domain knowledge base construction technology, question formal expression method, question classification and questions similarity calculation.Main distinctive achievements are as follows:
     (1) A domain knowledge expression, domain ontology extraction and construction method is proposed according to the deficiency of commonsense knowledge base, HowNet, in domain problem describing. The method is ontology oriented and can construct domain knowledge base, integrate domain knowledge base and common knowledge base on description of domain concept provided by HowNet's concept description language.
     (2) A question formal expression method is proposed. The method realized key words extraction and domain question expansion by lexical and semantic analysis. Syntactic interdependence tree of the question is extracted by question syntactic analysis. Question type, question focus and answer type of the question is obtained by the mapping rules of question type and answer type.
     (3)A domain question classification method based on language rule and statistical learning is put forward. First, question classification rules are extracted by language rules and domain knowledge. Then, the question classification model is constructed through extracting syntactic structure relation, domain features and improved Bayes classification learning algorithm. At last, domain question classification is realized by combining language rules and statistical learning. Experiment shows the proposed method is feasible.
     (4)According to the deficiency of current question similarity method, a domain question similarity calculation method combined with the feature of domain Chinese question is put forward. The method calculates the Semantic similarity between words, extracts question syntactic interdependence pairs, and calculates the similarity between question syntactic interdependence pairs based on domain knowledge base and common knowledge base to calculate domain question similarity which combines lexical, syntactic, semantic and domain knowledge. Experiment result shows great efficiency.
     (5)Collect domain features and implement the Yunnan tourism FAQ question answering system based on the research above.
引文
[1]Lide Wu , Xuanjing Huang , Yaqian Zhou , et al. FDUQA on TREC2003 QA task[A]. Proceedings of the 12th Text Retrieval Conference[C]. Gaithersburg, MD: NIST, 2003. 246-253.
    [2]E.Voorhees, D.Tice. The TREC-8 question answering track evaluation[C]. Proceedings the Eighth Text Retrieval Conference, Gaithersburg, NIST, 2000.83-105.
    [3]E.Voorhees. Overview of the TREC 2001 question answering Track[C]. Proceedings of the 10th Text REtrieval Conference. NIST. 2001. 42-53.
    [4]Christof Monz, From Document Retrieval to Question Answering[D]. Institute for Logic, Language and Computation Universiteit van Amsterdam Plantage Muidergracht 24,phd paper.
    [5]L. Wu, X. Huang, J. Niu, Y. Guo, Y. Xia, Z. Feng, FDU at TREC-10: Filtering, QA, Web and Video Tasks[C], In The Tenth Text REtrieval Conference (TREC 10) , 2001: 192-198
    [6]B. Wang, H. Xu, Z. Yang, Y.et al. TREC-10 Experiments at CAS-ICT:Filtering, Web and QA, In The Tenth Text REtrieval Conference (TREC 10), 2001:109-116
    [7]Wei Tan, Qunxiu Chen, Shaoping Ma.Thuir at Trec 2004:QA[C], Proceedings of the 13th Text REtrieval Conference, Gaithers burg, NIST, 2004.
    [8]MIT START .[EB/OL]. http://www.ai.mit.edu/projects/infolab/,2006-01.
    [9]E.Nyberg, T.Mitamura, et al. The JAVELIN question answering system at TREC2002[C]. Proceedings of the 11th Text REtrieval Conference, Gaithersburg, NIST, 2002.
    [10]L. Plamondon, G. Lapalme, Universite de Montreal, The QUANTUM question answering system[C]. Proceedings of the tenth Text REtrieval Conference, Gaithersburg, NIST, 2001.
    [11]Agirre E. and Rigau G .. A proposal for word sense disambiguation using conceptual distance, in International Conference "Recent Advances in Natural Language Processing" R AN LP'95,1995.
    [12]Ichiro Kobayashi.A study on meaning processing of dialogue with an example of development of travel consultation system[J].Information Sciences,2002(144):45-74.
    [13]王树西,刘群,白硕.一个人物关系问答系统的专家系统[J].广西师范大学学报,2003,21(1):31-36.
    [14]黄寅飞,郑方,燕鹏举.校园导航系统EasyNav的设计与实现[J].中文信息学报,2001,15(4):35-40.
    [15]夏天,樊孝忠,刘林,骆正华.基于ALICE的中文自然语言接口[J].北京理工大学学报,2004,24(10):885-889
    [16]樊孝忠,李宏乔,李良富.银行领域中文自动问答系统BAQS的研究与实现[J].北京理工大学学报,2004,24(6):528-532.
    [17]董振东,董强.知网简介[EB/O1].http://www.Keenage.com.1999.
    [18]梅家驹,等.同义词词林[M].上海:上海辞书出版社出版,1983.
    [19]Fellbaum.WordNet[EB/OL].http://www.cogsci.pinceton.edu/~wn/,2002-05.
    [20]Dell Zhang,Wee Sun Lee.Question Classification using Support VectorMaehines[A].Proceedings ofACM SIGIR Conference on Research and Development in Information Retrieval[C].Toron2 to,Cannada.2003.
    [21]Xin li,Dan Roth.Learning Question Classifiers[A].In:COLING2002,The 19th Internati2onal Conference on Computational Linguistics[C],2002,556-562.
    [22]Xin Li,Dan Roth.The Role of Semantic Information in Learning Question Classifiers[A].In:First Inernational Joint Conference on Natural Language Processing[C],2004,451-458.
    [23]张宇,刘挺,文勖.基于改进贝叶斯模型的问题分类[A].第一届全国信息检索与内容安全学术会议[C].上海.2004.
    [24]文勖,张宇,刘挺,马金山.基于句法结构分析的中文问题分类[J].中文信息学报,2006,(02)
    [25]穗志方,俞士汶.基于骨架依存树的语句相似度计算模型[C].中文信息处理国际会议(ICCIP'98),,北京,1998.
    [26]Liu Qun,Li Sujian.Word Semantic similarity computation based on HowNet.The third Chinese word semantic conference,China Taibei,2002(刘群,李素建.基于《知 网》的词汇语义相似度计算[C].第三届中文词汇语义学研讨会,中国台北,2002).
    [27]吕学强,任飞亮,黄志丹,姚天顺.句子相似模型和最相似句子查找算法[J].东北大学学报(自然科学版),2003,24(6):531-534.
    [28]车万翔,刘挺,秦兵,李生.基于改进编辑距离的中文相似句子检索[J].高技术通讯,2004,(7):15-19.
    [29]秦兵,刘挺,王洋,郑实福,李生.基于常问问题集的中文问答系统研究[J].哈尔滨工业大学学报,2003,35(10):1179-1182.
    [30]王洋,秦兵,郑实福.句子相似度计算在FAQ中的应用.第一届学生计算语言学研讨会论文集.第一届学生计算语言学研讨会.中国 北京 北京大学.2002:175-181
    [31]Vijayan Sugumaran,Veda C.Storey Ontologies for conceptual modeling:their creation,use,and management Data & Knowledge Engineering,2002(42):251-271.
    [32]Nicola Guarino.Understanding Building and Using Ontologies:A Commentary to Using Explicit Ontologiesin KBSD evelopment[J],In ternational Journal of Human and Computer Studies,1997,4 6:293-310.
    [33]Chandrasekaran,J.R.Josephson,R.Benjamins.The Ontology of Tasks and Methods,Proceedings of the 11th Knowledge Acquisition Modeling and Management Workshop,KAW'98,Canada,1998.
    [34]Riichiro MIZOGUCHI,Mitsuru IKEDA.Towards Ontology Engineering,Osaka University,Japan,Technical ReportA I-TR-96-1,I..S.I..R,1996
    [35]邓志鸿,唐世渭,张铭,杨冬青,陈捷.Ontology研究综述[J],北京大学学报(自然科学版),北京,2002,38(5):730-73.
    [36]高茂庭,王正欧.Ontology及其应用[J],计算机应用,2003,23:31-33.
    [37]王科,高常波,翟雪峰.汉语分词的主要技术及其应用展望[J].通信技术,2003,(6):12-15.
    [38]郑实福.中文自动问答技术研究[D].哈尔滨工业大学硕士学位论文.2002.
    [39]张云涛,龚玲,王永成.基于概念扩展的关键词检索技术[J].广西师范大学学报(自然科学版),2003,21(1):94-100.
    [40]李彬,刘挺,秦兵,李生.基于语义依存的汉语句子相似度计算[J].计算机应用研 究,2003,12(12):15-17.
    [41]刘海涛.依存语法和机器翻译.语言文字应用.1997,3:89-93.
    [42]郭艳华,周昌乐.一种汉语语句依存关系网协动生成方法研究.杭州电子工业学院学报.2000,20(41):24-32.
    [43]车万翔,刘挺,秦兵,李生.面向依存文法分析的搭配抽取方法研究.全国第六届计算语言学联合学术会议.2001,8.
    [44]盖杰.基于Ontology的自动问答系统关键技术研究[D].南京大学硕士学位论文.2004.
    [45]VladimirN.Va pnik(著),张学工(译)统计学习理论的本质[M].第一版,清华大学出版社,2000:8 5-155.
    [46]Yang Y,Liu X.A reexamination of text categorization methods[C].Proceedings 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIRp99),1999.42-49.
    [47]余正涛,樊孝忠,郭剑毅.基于支持向量机的汉语问句分类研究.华南理工大学学报.2005,33(9):25-29.
    [48]夏天.中文信息处理中的相似度计算研究与应用[D].北京理工大学博士学位论文.2005.
    [49]孔翔勇.基于知网的汉语词相似度计算[D].哈尔滨:哈尔滨工业大学硕士学位论文,2002-07.
    [50]Yu Zhengtao,HU Lei,Huang Li,Deng Jinghui.Similarity computation of Chinese question based on chunk.2006 International Conference on Machine Learning and Cybernetics(ICMLC2006),Shengyan China.August 2006,Vol7,pp.17-22.
    [51]Lin,Dekang.An Information-Theoretic Definition of Similarity[C].In Proceedings of the 15th International Conference on Machine Learning table of contents,San Francisco,1998.
    [52]Chatterjee,Niladri.A Statistical Approach for Similarity Measurement Between Sentences for EBMT[C].In Proceedings of Symposium on Translation Support Systems(STRANS-2001),2001.
    [53]余正涛,邓锦辉,毛存礼等.受限域FAQ汉语问答系统研究.计算机研究与发展.2007,44(S1):388-393.
    [54]Zhang Cheng,Yu Zhengtao,Deng Jinhui etc.Research on Method of Chinese Question Similarity Circularity in Restricted Domain.The 7th Intemation Symposium on Test and Measurement(ISTM2007),Beijing China,August 2007.
    [55]赵妍妍,秦兵,刘挺等.基于多特征融合的句子相似度计算[C].全国第八届计算语言学联合学术会议,南京,20050827.
    [56]刘小宇.基于语义理解的中文常问问答系统的研究[D].大连理工大学大学硕士学位论文.2006-12.
    [57]余正涛,胡磊,汤世平等.基于语块的中文问句相似度计算.计算机研究与发展.2005.2005,42(S1):373-377.
    [58]余正涛,樊孝忠,康海燕.基于自然语言理解的受限领域自动应答系统.计算机工程.2004,30(18):35-37.
    [59]Taku Kudoh,Yuji Matsumoto.Use of Support Vector Learning for Chunk Identification[C].In Proceedings of CoNLL-2000 and LLL-2000,Lisbon,Portugal,2000.
    [60]K.Hammond,R.Burke,C.Martin,S.Lytinen.FAQ finder:a case-based approach to knowledge navigation[C].Artificial Intelligence for Applications,Los Angeles,CA USA,1995
    [61]Huu Le Van,Andrea Trentini.FAQshare:a frequently asked questions voting system as a collaboration and evaluation tool in teaching activities[C].Proceedings of the 14th international conference on software engineering and knowledge engineering,Ischia,Italy,2002.
    [62]骆正华,樊孝忠,夏天.基于结构化问句实例的自动问答系统[J].微电子学与计算机,ISSN:1000-7180.0.2005-07-039.
    [63]Yu Zhengtao,QIU YanXia,Deng Jin-hui etc.Research on Chinese FAQ Question Answering System in Restricted Domain.2007 International Conference on Machine Learning and Cybernetics(ICMLC2007),2007.Hongkong China.August 2007.
    [64]王树西.问答系统:核心技术、发展趋势.计算机工程与应用,2005,41(18):1-3.
    [65]邓锦辉,余正涛,章程等.汉语语言处理接口的集成与应用.中南大学学报,2007,31(S1):1096-1110.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700