用户名: 密码: 验证码:
《汉语主题词表》本体化的自动生成研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
对自然语言中的词汇进行选择、规范、控制,并揭示词汇间的各种关系是叙词表的本质,在自然语言的基础上形成受控词汇的集合,从而构成一个叙词表。叙词表中的每个词汇都称为叙词,每个叙词都表达一定的概念。我国《汉语主题词表》(以下简称《汉表》)作为一部大型综合性科技检索工具,收词范围包括自然科学、医学、农业、工程技术等各学科领域的主要名词术语,是主题标引、检索和组织目录、索引的主要工具。而本体是一种新型的网络信息组织方法,作为一种能在语义知识层次上描述信息的概念模型工具,已经受到越来越多的关注。但由于本体自身的复杂性,现阶段本体的构建将是一项费时、费力的工作。由于叙词表已经汇集了众多领域专家的知识,包括了各学科领域中比较完整的词汇。因此,有人提议,以现有叙词表为基础构建本体。叙词表中的简单语义关系可以为本体的属性、实例以及各种关系的创建提供指导。本文以《汉语主题词表》为核心,构建基于词表的本体,对于促进词表在语义网下的发展具有一定的理论意义和实际应用价值。
     文章对《汉表》和本体的定义、构成要素和特点进行了详细的分析,并得出基于《汉表》转换本体的必要性和优越性。结合《汉表》的特点,提出了自身的一套构建本体的程序步骤,针对传统纸质版词表的缺陷,形成了词表的SQL数据库形式。最终借助Java程序调用Jena包自动生成基于《汉表》构建本体的模板,并按照规则提取出限制条件。最终实现《汉表》SQL数据库到本体的自动转换,并使用本体表示语言OWL对词表进行形式化表示。
The essence of thesauri is to choose, regulate, control, and to reveal the relationship between terms in the vocabulary of natural languages, and to form a collection of controlled vocabulary on the basis of natural language, thus constituting a thesaurus.Each word in the thesaurus is called a descriptor; each descriptor expresses a certain concept. In our country, as a retrieval tool for large-scale integrated technology, the Chinese Thesaurus received descriptors which cover science, medicine, agriculture, engineering and other disciplines.And the descriptors are majorly substantive. It is a main tool for thematic indicating, retrieving, catalogue organizing and indexing. Ontology is a new method of organizing network information. As a tool of notional model,which describing information on the level of semantic knowledge, it has been aroused more and more concern.However, due to the complexity of its own, ontology construction is a time-consuming and laborious work now.As the thesaurus has collected the knowledge of many experts in every field together, including a relatively complete vocabulary of the various disciplines.Therefore, it was proposed to construct ontology based on the existing thesauri.The simple semantic relationships in the thesaurus can provide guidance on the creation of attributes, instances and relationships of ontology. On the base of"Chinese Thesaurus",this paper constructed the Ontology based on thesauri.It has some theoretical and practical value for the further development of thesauri under the Semantic Web.
     Through the detailed analysis of the definitions, constituent elements and characteristics of Ontology and the Chinese Thesaurus, this paper resulted the advantages and necessity of the conversion from the Chinese Thesaurus to Ontology. Additionally, combined the features of the Chinese Thesaurus,it presented its own procedural steps of ontology construction and formed the SQL database format of the thesaurus in connection with the disadvantages of the hardcopy format of it. Finally, using the java procedure to transfer Jena package, it generated the model of Ontology based on the Chinese Thesaurus automatically. Therefore, it achieved the automatic conversion from the Chinese Thesaurus to Ontology, and use OWL, which is one of the languages of Ontology representation, to formalize the thesaurus.
引文
[1]贾君枝.汉语主题词表转换为本体的思考.中国图书馆学报,2007,04,41-44.
    [2]Harry Halpin,Sandro Hawke,Lvan Herman,etal. W3C Semantic Web Activity[S/OL]. [2009-03-20].http://www.w3.org/2001/sw/.
    [3]贾君枝.简单知识组织系统与汉语主题词表.中国图书馆学报,2008,01,75-75,84.
    [4]中国科学技术情报研究所,北京图书馆.汉语主题词表.北京:科学技术文献出版社,1980.
    [5]Alexander Maeche. Ontology learning for the semantic web. Norwell:Kluwer Academic Publishers,2002,15-17
    [6]Studer, Rudi, Richard Benjamins and Dieter Fensel. Knowledge Engineering: Principles and Methods. Data and Knowledge Engineering,1998,25,161-197
    [7]张继东,予以胜.利用叙词表构建本体的方法研究.图书情报知识,2006,06,82-85
    [8]唐静.叙词表转换为Ontology的研究.情报理论与实践,2004,27,642-645
    [9]刘和洋,曹宇峰.基于本体的中医专家临床病案的知识获取方法.计算机系统应用,2005,08,80-83
    [10]Medical Subject Headings. U.S. National Library of Medicine[DB/OL]. [2008-11-10].http://www.nlm.nih.gov/mesh/xmlmesh.html.
    [11]The Zthes specifications for thesaurus representation, access and navigation[DB/OL]. [2008-11-15].http://zthes.z3950.org/.
    [12]毛军.基于RDF的叙词表研究.情报学报,2003,22,163-168.
    [13]Qin Jian,Paling Stephen. Converting a controlled vocabulary to Ontology:the case of GEM. Information Reseach,2001,06(02)
    [14]Eman Jay ven. Owl Exports From a Full Thesaurus[J].Bulletin of the American Society for Information Science and Technology,2005,32(01):22-26
    [15]Dietrich H. Fischer. From thesauri towards Ontology. [2008-12-15]. http://www.ipsi.fraunhofer.de/orion/pubFulltexts/Fischer_1998.pdf.
    [16]唐爱民,真溱,樊静.基于叙词表的领域本体构建研究.现代图书情报技术,2005,04,1-5
    [17]Perez A.G, Benjamins V.R. Overview of Knowledge Sharing and Reuse Components:Ontologies and Problem-Solving Methods. Workshop on Ontologies and Problem-Solving Methods:Lessons Learners and Future Trends(IJCAI99), de Agosto, Estocolmo,1999
    [18]法律专用词汇.[2010-04-01].http://pinyin.sogou.com/dict/word_list.php?id=20646
    [19]倪皓,侯汉清.叙词间等级关系处理和显示的比较分析.国家图书馆学,2009,03,78-81
    [20]李景.本体理论及在农业文献检索系统中的研究.华中科技大学图书馆,2004
    [21]Domingue J.Tadzebao andWebOnto:Discussing, browsing, and editing ontologies on the web. Proceedings of the 11 th Knowledge Acquisition for Knowledge-Based Systems. Workshop,1998.
    [22]Natalya F. Noy and Deborah L. Mc Guinness. Ontology Development 101:A Guide to Creating Your First Ontology[R].Stanford University.2001,08
    [23]孙倩.基于叙词表的领域本体建模方法研究.山东大学,2007
    [24]贾君枝,卫荣娟,罗林强.《汉语主题词表》XML文档的自动生成研究.现代图书情报技术,2009,5,50-54
    [25]Jena-A Semantic Web Framework for Java. [2010-04-13].http://jena.sourceforge.net/
    [26]陈琮.基于Jena的本体检索模型设计与实现.武汉大学研究生毕业论文,2006
    [27]OpenSource.[2008-1-4].http://www.open-open.com/open60348.htm.
    [28]茫人.对Jena的简单理解和一个例子.[2010-04-13].http://www.360doc.com/content/07/1216/13/23378_899908.shtml.
    [29]Matthew Horridge, Holger Knublauch, Alan Rector, Robert Stevens, Chris Wroe.A Practical Guide To Building OWL Ontologies Using The Proteg'e-OWL Plugin and CO-ODE Tools Edition 1.0.50

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700