用户名: 密码: 验证码:
基于本体的电话内容文本分类研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
因特网的迅速发展,促使其访问方式的多元化发展。人们已经不再满足于仅仅通过计算机浏览器浏览因特网,越来越多的人希望可以使用电话,手机等通讯设备浏览网页。相对于表达能力有限的图像和文字,人们更倾向于使用自然语言交流。因此友好的语音交互越来越受到人们的青睐。VoiceXML建立在XML规范基础之上,是一种语音数据交换标准。给用户提供了通过语音工具访问网络资源的平台。VoiceXML作为一种语音数据的交换标准,它能够与数据库,以及其他建立在XML标准之上的其他数据文档进行无缝数据交换,从而把因特网和电话网紧密的结合起来。
     VoiceXML语音网关把用户文档提交给服务器,随着用户提交的信息量的增长,服务器在处理这些海量文档时,面临巨大压力,迫切的需要对信息进行自动分类,再对每个类别的文档分别处理。以往仅通过关键字本身对信息进行检索和分类,准确率和效率不是很理想,因为计算机不能理解关键字所蕴含的语义信息。为了能够更好地获得语义信息,在此引入本体的概念。可以借助本体来描述和分析关键字的语义。通过本体建模可以表达更深层次的语义信息。传统检索算法所采用的只是基于语法层面上字、词的简单匹配,而缺乏对知识的表示、处理和理解等能力。解决这些问题的关键在于把信息检索从基于关键字的语法匹配提升至基于知识(或上下文)层面的语义匹配。
     本体是一种知识表示工具,在实际应用中可能需要根据规则进行逻辑推理。本体的推理是指把隐含在显示定义和声明中的知识提取出来。本体是对共享概念模型的规范说明,是对知识的一种描述,如果要把本体应用在语义分析上就必须使用规则,在规则上进行推理。谓词逻辑是知识推理的重要表现手段。可以在本体表示知识库的基础上构建规则库,用来分析文本的语义信息。
     文中使用OWL语言来描述领域知识,使用规则系统来表示推理规则。目前编辑和开发本体的工具很多,本文采用了美国斯坦福大学的Protégé_3.2.1作为构建本体的平台。在这个平台上我们模拟构建了一个学校后勤管理的部分本体。并在该本体的基础上构建规则集合,用来对文本信息进行推理。为了解决文本自动分类的问题,本文提出了基于本体的电话内容的分类。本体是一种能在语义和知识层次上描述知识模型的建模工具,被人们应用到文本分类中,提高了分类的精度和速度。
The Internet develops rapidly and the methods of accessing the Internet are multifarious. People have been not satisfied with the only way to surf on line with the browsers of computers, Internet Explorer for example. And the users wish to view the web pages by telephone or mobile telephone instead of the computer screens. People prefer to communicate with natural languages rather than the figures and the letters. So the much friendly audio interface is becoming more popular. VoiceXML is a standard of exchanging audio data, which is based on XML. VoiceXML is a platform which provides an audio method to access the Internet. VoiceXML can connect and exchange data seamlessly with databases and other data documents based on XML standard. So it can connect the Internet with telephone net closely.
     The audio gateway based on VoiceXML submits the users’documents to the server. The server faces the huge pressure when the documents grow rapidly and it is needed to classify the information automatically. And then the classified documents will be handled respectively. It used to search or classify the information by keywords, but it doesn’t work well, because the computer can’t understand the implied semantic meaning of the keywords. Ontology is approached to solve the semantic problem. Ontology can be used to describe and analyze the semantic meaning of the keywords. The implied semantic information can be expressed by the ontology models. The classic search algorithms which match the words by the syntax and they lack the abilities of expressing, handling and comprehending of the knowledge. The main method to solve these problems is to match the words by semantics instead of syntax.
     Ontology is a kind of tool to describe knowledge, and it is a form of knowledge representation. And it can be the basis of the logical reasoning which works on rules. The reasoning of ontology means to extract the implied knowledge from the explicit definitions or statements. Ontology is an explicit and specification of a conceptualization, which is a kind of description of knowledge. If ontology is used for semantic analyze, rules must be approached. And the rules are used for reasoning. Predicate logic is an important form of knowledge representation. The rule system which is used to analyze the semantic information of text can be constructed on the knowledge repository which is based on ontology.
     OWL is used here to describe the knowledge in the domain and the rule system is used to express reasoning mechanism. There are lots of tools for editing and developing ontology. Protégé_3.2.1 which is developed by Stanford University is the platform to construct ontology here. Protégéis an open ontology editor and it is expanded based on Java. Protégéprovides a lot plug-in and APIs. We simulate to build the ontology of Administration of a college. And a rule system is built on the ontology which is used to manipulate the text information.
     We advance to classify the text content of the telephone by ontology to solve the problem discoursed above. Ontology is a modeling tool to express the semantic meaning and the knowledge. It is used in taxonomy to increase the precision and the working speed.
引文
[1] 李伟.基于 VoiceXML 的火车时刻自动语音查询系统的设计与实现[D]:[硕士学位论文].浙江:浙江大学计算机学院,2004.
    [2] 邓志鸿,唐世渭,张铭等. Ontology 研究综述[J].北京大学学报(自然科学版),2002,38(5).
    [3] 王佑才.基于语音的应程程序的开发——彩铃业务[D]:[硕士学位论文]. 湖北:武汉科技大学计算机学院,2005.
    [4] 崔健,吴英,张建忠,等. VoiceXML 语言解释器的设计与实现[J]. 计算机工程,2005,31(22).
    [5] 吴英,徐敬东,吴功宜.基于 VoiceXML 的语音电子邮件系统的设计[J].计算机工程,2005,31(5).
    [6] A Semantic Web Primer Grigoris Antoniou Frank van Harmelen The MIT Press
    [7] 孔红云.基于本体和问题求解方法的 Web 服务管理框架研究[D]:[硕士学位论文]. 江苏:南京航空航天大学,2007.
    [8] Krishnaprasad Thirunarayan, Aaron Berkovich. An information extraction approach to reorganizing and summarizing specifications[J]. Information and Software Technology, 2005(47):215-232.
    [9] Ian Horrocks,Peter F Patel-Schneider.OWL rules: A proposal and prototype implementation Web Semantics[J].Science, Services and Agents on the World Wide Web,2005(3):23-40.
    [10] Hong-Gee, Byung-Hyun Ha, Jae-II Lee, et al. A multi-layered application for the gross description using Semantic Web technology[J].Internaltional Jouranl of Medical Informatics,2005(74):399-407.
    [11] 万捷,滕至阳.本体论在基于内容信息检索中的应用[J].计算机工程,2003,29(4).
    [12] 袁琴,杨小虎.基于本体分类的 Web 服务合成的研究及应用[J].计算机工程,2003(2).
    [13] 张磊,苑伟政,王伟.基于领域本体的制造网格服务自动组合技术研究[J].计算机应用,2006(1).
    [14] 陶皖,廖述梅.本体映射概念及方法的研究[J].应用技术,2006(4).
    [15] 韩亚洪 刘永革.本体的查询与推理机制研究[J].计算机工程与应用,2005.
    [16] 黄远林,黄屹.基于本体的 XML 查询及其优化机制[J].计算机应用研究,2006.
    [17] Jelena Jovanovic, Dragan Gasevic. Achieving knowledge interoperability: An XML/XSLT approach [J]. Expert Systems with Applications, 2005(29):535-553.
    [18] Krishnaprasad Thirunarayan, Aaron Berkovich, Dan Z.Sokol. An information extraction approach to reorganizing and summarizing specifications[J]. Information and Software Technology, 2005(47):215-232.
    [19] K W Chau. An ontology-based knowledge management system for flow and water quality modeling[J]. Advances in Engineering Software, 2007(38):172-181.
    [20] Jun Shen, Georg Grossmann, Yun Yang, et al. Analysis of business process integration in Web service context[J]. Future Generation Computer Systems.2007(23):283-294.
    [21] Lu Xiao, Liang Zhang, Guang’an Huang, et al. Automatic Mapping from XML Documents to Ontologies[J]. In Proc. of the Fourth International Conference on Computer and information Technology, 2004.
    [22] Zdenek Mikovec,Ladislav Cmolik, Jiri Kopsa, et al. Beyond traditional interaction in a mobile environment: New approach to 3D scene rendering[J]. Computers & graphics, 2006(30) 714-726.
    [23] Jun-Seung Lee, Kyong-Ho Lee.Computing simple and complex matchings between XML schemas for transforming XML documents[J].Information and Software Technology, 2006(48)937-946.
    [24] Ruben Tous, Jaime Delgado.Contorsion: A Semantic XPath Processor[J].Electronic Notes in Theoretical Computer Science, 2006(15):87-102.
    [25] Philippe Martin, Peter Eklund. Embedding knowledge in Web documents[J]. Computer Networks, 1999(31):1403-1419.
    [26] Jun Shen, Yun Yang. Extending RDF in distributed knowledge-intensive applications[J]. Future Generation Computer Systems, 2004(20):27-46.
    [27] Artur Boronat, Jose A.Carsi, Isidro Ramos, et al. Formal Model Merging Applied to Class Diagram Integration[J].Electronic Notes in Theoretical Computer Science, 2007(166):5-26.
    [28] Michael Erdmann, Rudi Studer. How to structure and access XML documents with ontology[J]. Data & knowledge Engineering, 2001(36):317-335.
    [29] Giorgio Leonardi, Silvia Panzarasa, Silvana Quaglini, et al. van der Aalst. Interacting agents through a web-based health service flow management system[J]. Journal of Biomedical Informatics, 2007.
    [30] Robert G Raskin, Michael J Pan.Knowledge representation in the semantic web for Earth and environmental terminology (SWEET)[J].Computers & Geosciences, 2005(31):1119-1125.
    [31] Yi-Ping, Phoebe Chen, Supawan Promparmote, et al. MDSM: Microarray database schema matching using the Hungarian method[J].Information Sciences, 2006(176):2771-2790.
    [32] Bhavani Thuraisingham. Security standards for the semantic web[J]. Computer Standards & Interfaces, 2005(27):257-268.
    [33] Sunil Movva, Rahul Ramachandran, Xiang Li, et al. Syntactic and semantic metadata integration for science data use[J]. Computers & Geosciences, 2005(31):1126-1134.
    [34] Peter van Oosterom, Christiaan Lemmen, Tryggvi Ingvarsson, et al. The core cadastral domain model[J]. Computers, Environment and Urban Systems, 2006(30):627-660.
    [35] Sang Bong Yoo, Yeongho, Kim. Web-based knowledge management for sharing product data in virtual enterprises[J]. Int. J. Production Economics, 2002(75):173-183.
    [36] 邢平平,施鹏飞,赵奕.基于本体的数据挖掘方法[J].计算机工程,2001(5).
    [37] 陶春,张亮,施伯乐.基于本体的 XML 数据集成的查询处理[J].计算机研究与发,2005(3).
    [38] 陈哲,魏衍君.基于本体的 XML 数据源语义集成研究[J].郑州大学学报,2006(2).
    [39] 敖翔,孙义,王宗杰,张德政.基于本体的 XML 数据整合在数字气田的应用[J].微计算机信息, 2006,22(3).
    [40] 王志军,郭学俊.基于本体的 XML 语义集成研究[J].计算机技术与发展, 2006(8).
    [41] 饶元,冯博琴.基于本体的 XML 知识表示方法研究[J].微电子学与计算机, 2004(9).
    [42] 吴强,刘宗田,强宇.基于本体的知识库推理研究[J].计算机应用研究, 2005(1):50-52.
    [43] 刘洁,刘贵全,陈小平,等.连续认知结构推理方法及其应用[J].软件学报,2002(1):125-129.
    [44] 董振东,董强.面向信息处理的词汇语义研究中的若干问题[J].语言文字应用,2001(3):27-32.
    [45] 陈玉,卢正鼎,王渊.一种基于 Ontology 的 XML Schema 复用方法[J].计算机工程与科学,2005,27(11):81-83.
    [46] 温有奎,徐端颐,潘龙法.基于 XML 平台的知识元本体推理[J].情报学报,2004(6):643-648.
    [47] 马小明,薛贺邓,正宏.基于 VoiceXML 的交互式语音应答(IVR)系统的设计与实现[J]微电子学与计算机,2006(3):100-108.
    [48] 林海,陈建明,薛莹.基于动态服务协作的实时数据图形化[J].计算机系统应用,2006(8):16-19.
    [49] 胡可云,陆玉昌,石纯一.概念格及其应用进展[J].清华大学学报,2000,40(9):77-81.
    [50] 谢志鹏,刘宗田.概念格与关联规则发现[J].计算机研究与发展,2000,37(12):1415-1421.
    [51] 李选如,何洁月.语义集成:本体映射方法研究[J].计算机技术与发展,2007,17(2):121-124.
    [52] 徐德智,肖文芳,王怀民.本体映射过程中概念相似度计算[J].计算机工程与应用, 2007,43(9):167-169.
    [53] 蒋凯,武港山.基于 Web 的信息检索技术综述[J].计算机工程,2005,31(24):7-9.
    [54] 赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700