汉语领域术语非分类关系抽取方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

汉语领域术语非分类关系抽取方法研究

详细信息查看全文 | 推荐本文 |

英文篇名：Methods of Extracting Non-Categorical Semantic Relations between Chinese Terms
作者：朱惠 ; 王昊 ; 苏新宁 ; 邓三鸿
英文作者：Zhu Hui;Wang Hao;Su Xinning;Deng Sanhong;School of Information Management, Nanjing University;Jiangsu Key Laboratory of Data Engineering and Knowledge Services, Nanjing University;
关键词：汉语领域术语 ; 非分类关系 ; 本体 ; 领域概念模型 ; 术语空间结构
英文关键词：Chinese domain term;;non-categorical semantic relation;;ontology;;domain conceptual model;;term-space structure
中文刊名：QBXB
英文刊名：Journal of the China Society for Scientific and Technical Information
机构：南京大学信息管理学院;江苏省数据工程与知识服务重点实验室(南京大学);
出版日期：2018-12-24
出版单位：情报学报
年：2018
期：v.37
基金：江苏省社会科学基金项目“领域术语语义关系自动获取研究”(15TQB009);; 国家自然科学青年基金项目“面向学术资源的TSD与TDC测度及分析研究”(71503121)
语种：中文;
页：QBXB201812003
页数：11
CN：12
ISSN：11-2257/G3
分类号：23-33

摘要

本体是知识组织的有效方式,也是构建语义网的重要环节,而概念非分类关系又是本体的重要组成部分。由于术语是概念的外在表达,因此本文在深入分析当前国内外术语非分类关系抽取研究的基础上,引入共现分析、结构分析、模板构建、逻辑推理等方法和技术构建了面向汉语领域非结构化文本的术语非分类关系抽取模型,分别从内容和结构两个不同的角度抽取术语非分类关系。论文提出了模型的主要运行流程以及各功能模块的主要组成部件,对主要组成部件的具体实现进行了探讨,并对相关方法的局限性进行了论述。本文的研究为术语非分类关系抽取提供了新的思路,丰富了知识发现方法,同时也能为实现可行有效的知识组织提供参考。
Ontology is an effective method of knowledge organization, and it is also the important link in constructing the Semantic Web. The non-categorical semantic relations between concepts are important parts of ontology. Because the term is the external expression of a concept, this paper introduces co-occurrence, structural analysis, template construction, logical reasoning, and other methods to construct a model that can extract non-categorical semantic relations between terms from Chinese unstructured texts. The model extracts the relations from two different perspectives: content and structure. The paper puts forward the main operation flow of the model and the main components of each functional module, discusses the specific realization of the main components, and discusses the limitations of the methods. The research will provide new ideas for the extraction of non-categorical semantic relations between terms, enrich the methods of knowledge discovery, and provide references for the implementation of feasible and effective knowledge organization.

引文

[1]季培培,鄢小燕,岑咏华.面向领域中文文本信息处理的术语识别与抽取研究综述[J].图书情报工作,2010,54(16):124-129.
    [2]Castellvi M T C,Bagot R E,Palatresi J V.Automatic term detection:A review of current systems[M]//Recent Advances in Computational Terminology.2001:53-88.
    [3]刘豹,张桂平,蔡东风.基于统计和规则相结合的科技术语自动抽取研究[J].计算机工程与应用,2008,44(23):147-150.
    [4]翟笃风,刘柏嵩.政务领域本体术语的自动抽取[J].现代图书情报技术,2010,26(4):59-65.
    [5]张雷瀚,吕学强,李卓,等.领域本体术语的抽取方法研究[J].情报学报,2014,33(2):167-174.
    [6]袁劲松,张小明,李舟军.术语自动抽取方法研究综述[J].计算机科学,2015,42(8):7-12.
    [7]Vivaldi J,Rodriguez H.Evaluation of terms and term extraction systems:A practical approach[J].Terminology,2007,13(2):225-248.
    [8]Bolshakova E,Loukachevitch N,Nokel M.Topic models can improve domain term extraction[C]//Proceedings of the European Conference on Information Retrieval.Heidelberg:Springer,2013:684-687.
    [9]Gelbukh A,Sidorov G,Lavin-Villa E,et al.Automatic term extraction using log-likelihood based comparison with general reference corpus[C]//Proceedings of the International Conference on Application of Natural Language to Information Systems.Heidelberg:Springer,2010:248-255.
    [10]温春,王晓斌,石昭祥.中文领域本体学习中术语的自动抽取[J].计算机应用研究,2009,26(7):2652-2655.
    [11]周浪,张亮,冯冲,等.基于词频分布变化统计的术语抽取方法[J].计算机科学,2009,36(5):177-180.
    [12]周浪,史树敏,冯冲,等.基于多策略融合的中文术语抽取方法[J].情报学报,2010,29(3):460-467.
    [13]杨双龙,吕学强,李卓,等.中文专利文献术语自动识别研究[J].中文信息学报,2016,30(3):111-124.
    [14]岑咏华,韩哲,季培培.基于隐马尔科夫模型的中文术语识别研究[J].现代图书情报技术,2008(12):54-58.
    [15]王海雄,郭剑毅,余正涛,等.基于CRFs的中文领域术语自动抽取研究[C]//第六届全国信息检索学术会议论文集.北京:中国中文信息学会,2010:505-512.
    [16]Agarwal M,Goutam R,Jain A,et al.Comparative analysis of the performance of CRF,HMM and MaxEnt for part-of-speech tagging,chunking and named entity recognition for a morphologically rich language[C]//Proceedings of the Pacific Association for Computational Lingustics,2011.
    [17]Zheng D,Zhao T,Yang J.Research on domain term extraction based on conditional random fields[C]//Proceedings of the International Conference on Computer Processing of Oriental Languages.Heidelberg:Springer,2009:290-296.
    [18]Li L S,Dang Y Z,Zhang J,et al.Domain term extraction based on conditional random fields combined with active learning strategy[J].Journal of Information and Computational Science,2012,9(7):1931-1940.
    [19]Girju R,Moldovan D I.Text mining for causal relations[C]//Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference.Palo Alto:AAAI Press,2002:360-364.
    [20]Morin E,Jacquemin C.Automatic acquisition and expansion of hypernym links[J].Computers and the Humanities,2004,38(4):363-396.
    [21]汤青,吕学强,李卓.本体概念间上下位关系抽取研究[J].微电子学与计算机,2014,31(6):68-71.
    [22]陈珂.构造领域本体概念关系的自动抽取[D].上海:上海交通大学,2008.
    [23]Lee S,Huh S Y,Mcniel R D.Automatic generation of concept hierarchies using WordNet[J].Expert Systems with Applications,2008,35(3):1132-1144.
    [24]涂鼎,陈岭,陈根才,等.基于多路层次聚类的商品评论数据概念分类构建[J].计算机研究与发展,2013,50(S2):208-215.
    [25]贾文娟,何丰.基于HowNet的中文本体学习方法研究[J].计算机技术与发展,2011,21(6):77-80.
    [26]王龙甫.基于中文百科的概念知识库构建[D].浙江:浙江大学,2015.
    [27]Miller G A,Charles W G.Contextual correlates of semantic similarity[J].Language and Cognitive Processes,1991,6(1):1-28.
    [28]de Knijff J,Frasincar F,Hogenboom F.Domain taxonomy learning from text:The subsumption method versus hierarchical clustering[J].Data&Knowledge Engineering,2013,83:54-69.
    [29]彭成,季佩佩.基于确定性退火的中文术语语义层次关联研究[J].计算机应用研究,2011,28(9):3235-3238.
    [30]谷俊,朱紫阳.基于聚类算法的本体层次关系获取研究[J].现代图书情报技术,2011(12):46-51.
    [31]温春,石昭祥,杨国正.一种利用度属性获取本体概念层次的方法[J].小型微型计算机系统,2010,31(2):322-326.
    [32]董丽丽,胡云飞,张翔.一种领域概念非分类关系的获取方法[J].计算机工程与应用,2013,49(4):157-161.
    [33]王红,高斯婷,潘振杰,等.基于NNV关联规则的非分类关系提取方法及其应用研究[J].计算机应用研究,2012,29(10):3665-3668.
    [34]谷俊,严明,王昊.基于改进关联规则的本体关系获取研究[J].情报理论与实践,2011,34(12):121-125.
    [35]Mei K W,Abidi S S R,Jonsen I D.A multi-phase correlation search framework for mining non-taxonomic relations from unstructured text[J].Knowledge and Information Systems,2014,38(3):641-667.
    [36]古凌岚,孙素云.基于语义依存的中文本体非分类关系抽取方法[J].计算机工程与设计,2012,33(4):1676-1680.
    [37]张立国,陈荔.维基百科中基于语义依存的领域本体非分类关系获取方法研究[J].情报科学,2014,32(6):93-97.
    [38]王岁花,赵爱玲,马巍巍.从Web中提取中文本体非分类关系的方法[J].计算机工程与设计,2010,31(2):451-454.
    [39]何宇,吕学强,刘秀磊,等.中文专利领域本体概念间非分类关系抽取[J].计算机工程与设计,2017,38(1):97-102.
    [40]Sánchez D,Moreno A.Learning non-taxonomic relationships from web documents for domain ontology construction[J].Data&Knowledge Engineering,2008,64(3):600-623.
    [41]Villaverde J,Persson A,Godoy D,et al.Supporting the discovery and labeling of non-taxonomic relationships in ontology learning[J].Expert Systems with Applications,2009,36(7):10288-10294.
    [42]Weichselbraun A,Wohlgenannt G,Scharl A.Refining nontaxonomic relation labels with external structured data to support ontology learning[J].Data&Knowledge Engineering,2010,69(8):763-778.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700