基于内容相关度计算的文本结构分析方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于内容相关度计算的文本结构分析方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Study on Method To Automatically Analyze the Text Structure Based on the Relevancy Computing of Text Content
作者：钟茂生
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：文本结构分析 ; 文本理解 ; 文本组织结构 ; 词语语义关系 ; 句际语义关系 ; 文本分割 ; 文本层次结构分析
英文关键词：text structure analysis ; text understanding ; text structure ; word semantic relations ; inter-sentence semantic relations ; text segmentation ; and text hierarchical structural analysis
学位年度：2010
导师：陆汝占
学科代码：081202
学位授予单位：上海交通大学
论文提交日期：2009-06-01

摘要

文本结构可分为物理和逻辑结构两种形式,文本的物理结构是指组成文本的基本要素(如标题、段落、句子、词汇和标点符号等)在文本中的实际位置所决定的结构,可以用向量空间模型予以表示;文本的逻辑结构是指组成文章思想内容的主题、层次、段落、句子、主题词在概念意义上所形成的逻辑关系,通常用树或图予以表示。文本结构自动分析就是期望计算机能够自动将一个文本划分成互不相交的若干文本单元,或者从语义上将文本解析成为一棵层次结构树,以便获得文章本来的逻辑结构。
     文本结构分析对于实现文本理解和文本推理具有重要意义,只有从宏观上把握文章的逻辑结构,才能更合理的从全局的视角理解文章的主题及中心思想;同时,文本结构分析结果对于文本自动摘要、基于篇章段落的信息检索、话题检测与跟踪等自然语言处理任务也具有重要作用和影响。然而,文本结构通常需要在理解上下文内容的基础上才能获得,而对语言的理解又超出了目前计算机的能力和水平,因此,要使计算机在没有理解上下文内容的前提下,准确的分析出文本的组织结构,是一个非常困难的问题。
     本文根据文本篇章组织结构理论以及文本组织结构特点,将文本结构分析转换成为线性结构分析或者层次结构分析任务。据此,首先通过研究词语间语义相关度计算方法、句子间语义关系识别和句子间语义相关度计算方法,进行文本上下文内容相关性分析和相关度计算,并以此为基础,研究对文本进行线性结构分析或者层次结构分析的关键技术和方法。
     具体来说,本文的创新性工作主要体现在以下几个方面:
     (1)对文本结构进行抽象描述。将文本中的“句子”、“标题”、“自然段”、“文章”、“主题/子主题”等语言学概念加以形式标记;提出了“基本论证结构”、“递归论证结构”、“文本结构树”、“文本主题单元”等层次结构概念及表示方法,以便对文本组织结构模式进行抽象描述;对“主题的级”、“主题结点粒度”进行了定量化描述和计算,以便刻画文本结构树中主题结点对内容的涵盖能力。
     (2)研究了词语间语义相关关系及相关度计算方法。在分析词语间相关度和相似度概念关系的基础上,提出了词语间语义广义相关度的概念及其相应的计算方法:首先从外延逻辑思想出发,提出了一种基于语料的、通过构建词语语义关系二分图的方法,来计算词语狭义相关度;同时,以汉语概念内涵逻辑模型思想为基础,提出了一种基于词典内涵释义及释义项展开的词语语义相关度计算方法,其计算结果强调的是词语在内涵概念上的关联关系;然后,将两种方法计算的相关度结果进行融合,得到词语语义广义相关度。通过标准的M&C中文版测试数据集评测结果表明,融合得到的广义相关度汲取了外延逻辑刻画实体分类的优势和内涵逻辑刻画汉语凸显实体内涵属性特征的优势,取长补短、优势互补,其计算结果更接近人的认知和判断。
     (3)研究了语篇上下文句子之间语义关系及相关度计算方法。首先,根据语言学界总结的句际语义关系和它对应的词语形式标记,提出了一种机器自动识别上下文句际语义关系的方法(定性方法),包括词语形式模板的获取、模板冲突消解的方法以及句际语义关系识别算法,并用实验验证了该方法的有效性和识别效果;其次,提出了一种基于词语广义相关度的句子间相关度计算方法(定量方法),实验表明,本文提出的句子间相关度计算结果比句子间相似度计算结果更接近人的理解和判断。
     (4)根据词语广义相关度计算方法、句际语义关系分析与相关度计算方法,研究了文本线性结构分析中的相关问题,提出了一种基于内容相关性分析的文本分割方法,实验表明,本文提出的方法在文本分割性能上要好于经典的TextTiling算法,而且也好于现有文献报道的面向中文的文本分割算法的性能。
     (5)研究了文本层次结构分析的相关问题,并假定同一类型的文本应该具有相同或相似的组织结构模式。据此,提出了一种基于Na?ve Bayes模型的文本层次结构分析方法,即用Na?ve Bayes模型从训练文本中学习文本的组织结构模式,再根据获取得到的文本组织结构模式,对待分析的同类型文本,按照自底向上的方式,递归的向上归并,直到生成只包含一个根结点的文本结构树。同时,提出了一种基于生物序列比对算法的文本结构分析方法,从训练文本中学习文本组织结构模式,以便进行文本组织结构分析。实验结果表明,上述两种方法都取得了一定的效果,从目前的测试数据集上看,前者要比后者具有更好的性能。
Text structure can be considered to have both the Physical and Logical structure. The Physical structure of a text is a structure determined by the actual location of the basic components of the text (such as titles, paragraphs, sentences, vocabularies, punctuations, etc.), and Vector Space Model can be used to denote that structure. The Logical structure of a text is a logical relation or logical structure built by subjects, levels, paragraphs, sentences and keywords, which together, based on the concept meaning, reflect the topic or clou of the text, usually expressed by a tree-diagram or a graph. The automatic analysis of text structure is to use the computer to divide a text into a number of disjoint text units (semantic paragraphs), or to parse it into a hierarchical tree based on meaning, so that people can obtain the original logical relation or logical structure of the corresponding text.
     The automatic analysis of text structure is a very significant step to achieve the automatic text understanding. Since only by holding the logical organizational structure of the article in a macro level, the topic or clou of the article can be more easily understood from the overall perspective. At the same time, the results of the text structure analysis have an important influence on many other natural language processing tasks, such as automatic text summarization, information retrieval, topic detection and tracking, etc. However, the understanding of a text, which is beyond the capability of computers, is the basis for text structure analysis. Therefore, it is a tough job for computers to analyze the logical structure of a text as accurately as possible without the understanding of a context.
     Based on the theory of text organizational structure and the characteristics of text structure, this paper divides the text structure analysis into two tasks: linear structural analysis and hierarchical structural analysis. Hereby, we firstly researched and proposed some approaches to calculate the degrees of semantic relevancy between words in Chinese, to recognise the semantic relation between sentences in a context, and to calculate the degrees of semantic relevancy between sentences, for analyzing the semantic relevancy of the context and calculating the relevancy degrees of the context. Then, based on the relevancy analyzing and calculating of the context content, we in-depth study on the theories and methods concerning linear structural analysis and hierarchical structural analysis. To be specific, the present paper mainly contributes:
     (1) To the abstractive description of a text structure. The concepts of‘sentence’,‘title’,‘paragraph’,‘article’,‘topic or sub-topic’, etc. are described formally; new concepts like‘basic argument structure’,‘recursive argument structure’,‘text-structure tree’,‘text-topic units’etc. are proposed and described formally. At the same time, a method to quantitatively describe or calculate‘the level of a topic’and‘the granularity of a topic’is presented. All of these serve for the premise or basis to carry out structure analysis.
     (2) To the semantic relevancy relation and relevancy degree between words. By analyzing the relevancy degree and similarity between between words, we propose the concept of‘broad-sense relevancy degree’of word meanings. For calculating the broad-sense relevancy degree between words, we first propose a corpus-based method to calculate the semantic relevancy degree between words through constructing bipartite graph of lexical semantic relation, which is also known as narrow-sense relevancy degree. Secondly, based on the idea of Concept Intersional Logical Model of the Chinese Language, we propose a method of calculating semantic relevancy degree between words in light of the definitions of a lexical item or its sub-item in a dictionary. The results of the calculation stress more on similarity or relevancy between words in their conceptual meanings. Finally, we combine the above two results to form a broad-sense relevancy degree. Tested by the data in the Standard M&C Chinese Version, the results show that the above first and second approaches can complement each other and the combination of which can achieve the result of broad-sense relevancy degree, which is close to what achieved by man’s cognition or judgments.
     (3) To the semantic relevancy relation between sentences in a context. First, according to the inter-sentence semantic relationship and its corresponding word-form tags summarized by the specialists in linguistics, we propose an automatic recognition (qualitative) method to recognize the semantic relation between sentences in a context, including the approach to obtain the templates of word-form tags, the approach to resolve the conflict between templates, and the algorithm to recognize the inter-sentence semantic relations. The method of automatic analysis is then tested for its validity and effectiveness. Second, we propose a calculating (quantitative) method based on the generalized semantic relevancy between words to calculate the relevancy degree between sentences. The tests show that the results of the relevancy degree calculation are closer to the man’s judgment than the existing method of similarity calculation that calculates similarity between sentences
     (4) To the linear structural analysis of the discourse and its related issues. Based on the above method to calculate the broad-sense relevancy degree between words and to calculate or analyze the semantic relevancy between sentences, we carry out the linear structural analysis of texts and study its related issues, and then presented a text linear segmentation method based on the content relevancy analysis in the context. Tests show that our method is better in segmenting texts than the classic method of TextTiling algorithm, and also better than the existing text segmentation algorithm already reported for Chinese texts.
     (5) To the hierarchical structural analysis and its related issues by confirming the idea those texts of the same type should have the same or similar structural mode. Accordingly, we first propose a text hierarchical structural analysis method based on Na?ve Bayes model, namely, to learn text organizational structural mode from the training corpus by using Na?ve Bayes model, and then recursively merge the nodes upward until a tree of text structure with a root node is generated. Moreover, we propose a text hierarchical structural analysis method based on the bio-sequence alignment algorithm. That is, by using the sequence alignment algorithm, it finds the most similar text in text structure from the training corpus as the test text, and acquires its text structural mode. Thus the structure of the test text can be automatically analyzed in the light of structural mode. The test results show that the above two methods work the same. But from the current test data set, the former has better performance than the latter.

引文

[1] Peter Lyman and Hal R.Varian. How Much Information. http://www.sims.berkeley.edu/how-much-info-2003. 2003
    [2]迟成英,麻志毅,姚天顺.文本理解与汉语文本结构分析.中文信息. 1997. 1
    [3]林鸿飞.基于潜在语义索引的文本分析方法.模式识别与人工智能. 2000. 13(1): 47-51
    [4] Skorochod’ko,E. Adaptive method of automatic abstracting and indexing. Information Processing. 1972,71, pp.1179-1182.
    [5] Mann,W.C. and Thompson,S.A. Rhetorical Structure theory:A theory of text organization. Information Sciences Institute, University of Southern California, 1987.
    [6]黄国文.语篇分析概要.湖南教育出版社出版, 1988.4
    [7]廖秋忠,篇章与语用和句法研究,廖秋忠文集.北京:北京语言学院出版社. 1992.10
    [8] Harris, Z.S. Discourse Analysis. Language. 1952(28) :1-30
    [9] Mitchell T.F. The Language of Buying and Selling in Cyrenaica. Hesperis. 1957(44)
    [10] Weinrich, H. Tempus:Besprochene und Erz?hlte Zeit. Stuttgart: W.Kohlhammaer. 1964
    [11] Halliday M.A.K. Explorations in Functions of Language. London: Edward Arnold, 1973. 8
    [12] Halliday M.A.K., Hasan R., Cohesion in English. London: Longman, 1976.
    [13] van Dijk, T.A. Text and Context: Explorations in the Semantics and Pragmatics of Discourse. London: Longman. 1977.
    [14] Beaugrande, R. de. W. U. Dressler. Introduction to Text Linguistics. London: Longman. 1982.
    [15] Haiman, J., Sandra A. Thompson. Clause Combining in Grammar and Discourse. Amsterdam: John Benjamins, 1988
    [16]廖秋忠,现代汉语篇章中的连接成分.中国语文. 1986.6,
    [17]廖秋忠,现代汉语篇章中指同的表达.中国语文. 1986.2
    [18]廖秋忠,物体部件的描写顺序,廖秋忠文集.北京:北京语言学院出版社. 1992.10
    [19]陈平,汉语零形回指的话语分析.中国语文. 1987.5
    [20]廖秋忠,篇章中的论证结构,廖秋忠文集.北京:北京语言学院出版社. 1992.10
    [21]廖秋忠,篇章中的管界问题.中国语文. 1986.4
    [22]徐赳赳,篇章中的段落分析.中国语文.1996. 2
    [23]孟传书.写作中的篇章结构知识.天津人民出版社出版. 1985.9
    [24]姜岷山,刘汉云,李学谦等.大学英语篇章结构的基本原理和普遍法则,兵器工业出版社,1983
    [25] Zhu Jingbo, Ye Na, Chang Xinzhi, Chen Wenliang, and Benjamin K Tsou. Using Multiple Discriminant Analysis Approach for Text Segmentation[J]. Lecture Notes in Artificial Intelligence. 2005,3651:292-301.
    [26] Hearst M.A. TextTiling: A Quantitative Approach to Discourse Segmentation. Technical Report Sequoia93/24. Berkeley:University of California, 1993
    [27] Hearst M.A.Multi-paragraph Segmentation of Expository Text, in Proceedings of the ACL, 1994, pp.9-16
    [28] Hearst M.A.Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, vol. 23(1), pp. 33-64, 1997
    [29] Morris, Jane, and Graeme Hirst. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text[J]. Computational Linguistics. 1991,17:21-48.
    [30] Jeffrey C.Reynar. Topic segmentation:Algorithms and applications[D]. University of Pennsylvania,1998
    [31] Jeffrey C.Reynar. An automatic method of finding topic boundaries[A]. Proceedings of ACL’94[C], 1994.
    [32] Richmond,K., Smith,A., and Amitay,E. Detecting subject boundaries within text: A language independent statistical approach[A]. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP--97) [C], Providence, Rhode Island, 1997, 47--54,
    [33] Kozima,H. Text segmentation based on similarity between words[A]. In proceedings of the 31th Annual Meeting of theAssociation for Computational Linguistics,Student Session[C]. 1993, 286-288.
    [34] Grimes,J.E. The Thread of Discourse[M]. Mouton,The Hague.1975
    [35] Nakhimovsky,A. Aspect, aspectual class, and the temporal structure of narrative[J]. Computational Linguistics,1998,14(2):29-43.
    [36] Grosz, B.J. and Sidner, C.L. Attention,intentions and the structure of discourse[J]. Computational Linguistics. 1986,12(3):175-204
    [37]Renchman,R. Plain-speaking: A Theory and Grammar of Spontaneous Discourse[D]. Harvard University,Department of Computer Science.1981.
    [38]Youmans,G. Measuring lexical style and competence: The typetoken vocabulary curve [J]. Style, 1990,24:584-599.
    [39]Chinchor,N. MUC-7 named entity task definition, dry run version, version 3.5[A]. Documentation for the Seventh Message Understanding Conference [C] . 1997
    [40]Levy,E.T. Communicating Thematic Structure in Narrative Discourse:The Use of Referring Terms and Gestures[D]. University of Chicago,1984
    [41] Beeferman,D.Berger,A. and Lafferty,J. Statistical Models for Text Segmentation[J]. Machine Learning.1999,34:177-210.
    [42] Reynar J.C. Statistical Models for Topic Segmentation. In Proceedings of the 37th Annual Meeting of the ACL, 1999
    [43] Utiyama M., Isahara H. A statistical model for domain-independent text segmentation. In: Proc. of the 9th Conf. of the European Chapter of the Association for Computational Linguistics. 2001. 491-498
    [44] Choi,F.Y.Y., Wiemer-Hastings,P., and Moore,J. Latent Sementic Analysis for Text Segmentation[A].
    [45] Fragkou,P., Petridis,V., and Kehagias,ATH. A Dynamic Programming Algorithm for Linear Text Segmentation[J]. Journal of Intelligent Information System. 2004,23(2):179-197.
    [46]Ponte,J.M. and Croft,W.B. Text Segmentation by Topic[A]. Proceedings of the 1st European Conference on Research and Advanced Technology for Digital Libraries[C]. 1997:120-129.
    [47]Blei,D.M. and Moreno,P.J. Topic Segmentation with an Aspect Hidden Markov Model[R]. Technique Report CRL. COMPAQ Cambridge Reseach Lab. 2001
    [48]Salton G., Allen J. Automatic Text Decomposition and Structuring. In: Proc RIAO94, Page: 6-20
    [49]Salton G., Allen J.,Buckley C.. Automatic Structuring and Retrieval of Large Text Files. Communications of the ACM. 1993,37(2): 97-108
    [50]Salton G., Allen J.,Buckley C. and Singhal A.. Automatic Analysis Theme Generation and Summarization of Machine-Readable Texts. Science, 1994,264(3):1421-1426
    [51] Yarri,Y. Segmentation of expository texts by hierarchical agglomerative clustering[A]. In Proceedings of Recent Advances in Natural Language Processing[C]. Bulgaria.1997
    [52]Boguraev B.K.,Neff M.S.. Discourse segmentation in aid of document summarization. In Proceedings of Hawaii International Conference on System Sciences (HICSS- 33), Minitrack on Digital Documents Understanding, Maui, Hawaii. IEEE, 2000.
    [53] Liu Chuanhan, Wang Yongcheng, Zheng Fei, Liu Derong, Using LSA and text segmentation to improve automatic Chinese dialogue text summarization, Journal of Zhejiang University SCIENCE A, 2007 8(1):79-87
    [54]刘挺,王开铸.基于篇章多级依存结构的自动文摘研究.计算机研究与发展. 1999. 36(4):479-488
    [55]张美娜,亓超,迟呈英,战学刚.基于汉语篇章结构的自动摘要方法研究.情报杂志. 2007 (8):34-36
    [56]贾果.基于篇章结构的自动文摘方法研究.计算机与数字工程. 2007.38(6):10-13,31
    [57]王继成,武港山,周源远,张福炎.一种篇章结构指导的中文Web文档自动摘要方法.计算机研究与发展. 2003. 40(3):398-405
    [58] Fernando Llopis, Antonio Ferrández, and JoséLuis Vicedo, Text Segmentation for Efficient Information Retrieval. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2002). Lecture Notes in Computer Science, Vol 2276 /2002.pp.373-380
    [59] Salton G., Allen J., Buckley C. Approaches to Passage Retrieval in Full Text Information Systems. In Proceedings of the 16th Annual International ACM/SIGIR Conference. Pittsburgh PA, 1993. 49-58
    [60] Mittendorf E., Sch?uble P.. Document and Passage Retrieval Based on Hidden Markov Models. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994.pp.318-327
    [61]付鸿鹄,张晓林.基于段落检索和段落内容分析的知识化检索系统设计.情报理论与实践. 2007. 30(5):681-685
    [62]洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述.中文信息学报. 2007.21(6):71-87
    [62] Mulbregt P.van, Carp I., Gillick L., Lowe S., and Yamron, J. Text Segmentation and Topic Tracking on Broadcast News Via a Hidden Markov Model Approach. In Proceedings of the ICSLP'98, 1998. pp.2519--2522
    [63] Lekha C., Tat-seng C. , and Chin-hui L. A Multi-Modal Approach to Story Segmentation for News Video[J]. World Wide Web:Internet and Web Information Systems. 2003,6:187-208
    [64]凌坚.新闻视频主题识别与跟踪的研究.浙江大学博士学位论文. 2007.4
    [65]乐明.汉语篇章修辞结构的标注研究.中文信息学报. 2008.22(4):19~23,43.
    [66]乐明.汉语财经评论的修辞结构标注及篇章研究.中国传媒大学博士学位论文. 2006.6
    [67]陈莉萍.汉语篇章结构标注的理论支撑.南京航空航天大学学报(社会科学版).2008.10(3):68-71
    [68] Na Ye , Jingbo Zhu , Haitao Luo ,Huizhen Wang , Bin Zhang . Improvement of the Dotplotting Method for Linear Text Segmentation . IEEE International Conference on Natural Language Processing and Knowledge Engineering. 2005.10.pp.636-641
    [69]叶娜,郑妍,朱靖波,张斌.基于二维动态规划的文本分割模型.第三届全国信息检索与内容安全学术会议.2007.11. pp. 209-215
    [70]石晶,戴国忠.基于PLSA模型的文本分割.计算机研究与发展. 2007,44(2):242-248
    [71]石晶,胡明,戴国忠.基于小世界模型的中文文本主题分析. 2007. 21(3):69-75
    [72]张益民,陆汝占,沈李斌.一种混合型的汉语篇章结构自动分析方法.软件学报. 2000.11(11):1527- 1533
    [73] Zhang Yuntao, Gong Ling, Wang Yongcheng, Hierarchical Subtopic Segmentation of Web Document, Wuhan University Journal of Natural Sciences. 2006. 11(1). pp.47-50
    [74]单永明.汉语文本的篇章结构及其标引算法的研究.中文信息学报. 2002. 16(2): 14-19
    [75]单永明.一类规范文本篇章结构的自动标引.中文信息学报. 1997. 12(4): 47-51
    [76]张美娜,迟呈英,战学刚,亓超.基于篇章结构的文本自动标引算法.计算机应用与软件. 2008. 25(9):122-124
    [77] Hans Kamp, A theory of truth and semantic representation, In J.Groenendijk, T.Janssen & M.Stokhof,eds., Truth, Interpretation and Information, Dordrecht: Foris.1981. pp.1-41.
    [78]俞士汶.计算语言学概论.北京:商务印书馆. 2003
    [79]张燕飞编著.信息组织的主题语言.武汉大学出版社. 2005.11
    [80]赵军.命名实体识别、排岐和跨语言关联.中文信息学报. 2009. 23(2):3-17
    [81] ACE08 Evaluation Plan v1.2d. http://www.itl.nist.gov/iad/mig/tests/ace/2008/doc/ace08-evalplan.v1.2d.pdf
    [82] Hobbs J R, Bear J, Israel D, et al. SRI International FASTUS System: Muc-6 Test Results and Analysis[C].In Proceedings of the 6th Message Understanding Conference(MUC-6). 1995. pp:237-248
    [83] Roman Y, Grishman R. NYU:Description of the Proteus/PET System as Used for MUC-7 ST[C]. In Proceedings of the 7th Message Understanding Conference(MUC-7). 1998
    [84] Aone C. and Ramos-Santacruz M.. REES:A Large-Scale Relation and Event Extraction System[C]. In Proceedings of the 6th Applied Natural Language Processing Conference, 2000. pp:76-83.
    [85] Suzuki J. Isozaki H. and Maeda E. Convolution Kernels with Feature Selection for Natural Language Processing Tasks[C]. In Proceedings of the 42nd Meeting of Association for Computational Linguistics. 2004, pp:119-126
    [86]车万翔,刘挺,李生.实体关系自动抽取.中文信息学报, 2005,19(2):1~6.
    [87] Zelenko D, Aone C, Richardella A. Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003(2): 1083-1106
    [88] Zhao Shubin, Ralph Grishman. Extracting Relations with Integrated Information Using Kernel Methods[C]. In Proceedings of the 43rd Annual Meeting of Association for Computational Linguistics. 2005. pp: 419-426
    [89] Che Wanxiang, Jiang Jianmin, Su Zhong, et al. Improved-Edit-Distance Kernel for Chinese Relation Extraction[C]. InProceedings of the Second International Joint Conference on Natural Language Processing(IJCNLP-05).2005,pp:132-137.
    [90] Bunescu R C, Mooney R.J.. A Shortest Path Dependency Kernel for Relation Extraction. In Proceedings of HLT/EMNLP, 2005.
    [91] Zhang M, Zhang J, Su J, et al. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, July 2006, pp825-832
    [92]刘克彬,李芳,刘磊等.基于核函数中文关系自动抽取系统的实现[J].计算机研究与发展. 2007,44(8):1406-1411.
    [93]黄瑞红,孙乐,冯元勇,黄云平.基于核方法的中文实体关系抽取研究[J].中文信息学报. 2008,22(5):102-108.
    [94]庄成龙,钱龙华,周国栋.基于树核函数的实体语义关系抽取方法研究[C].中文信息学报. 2009,23(1):3-8,34.
    [95] Iria J. T-Rex:A Flexible Relation Extraction Framework[C]. In Proceeding of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK’05), Manchester, January 2005.
    [96] Schutz A., Buitelaar P.. RelExt:A Tool for Relation Extraction from Text in Ontology Extension[C]. In Proceedings of the 4th International Semantic Web Conference. November 2005, pp:593-606.
    [97] Marta Sabou, Mathieu d'Aquin, Enrico Motta: SCARLET: SemantiC RelAtion DiscoveRy by Harvesting OnLinE OnTologies. In Proceedings of the 5th European Semantic Web Conference(ESWC08). June 2008,pp: 854-858
    [98] Specia L., Motta E.. A Hybrid Approach for Extracting Semantic Relations from Texts[J]. Lecture Notes in Computer Science, 2006(4027):564-576.
    [99] Resnik P.. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence, 1999, 11(11):95-130
    [100] Karov Y., Edelman S.. Similarity-based word sense disambiguation. Computational Linguistics, 1998,24(1):41-59
    [101] Hu Y., Lu R., Chen Y., and Liu H.. The Dictionary-Based Quantified Conceptual Relations for Hard and Soft Chinese Text Clustering. In Proceedings of NLDB2007, Lecture Notes in Computer Science, 2007, Vol.4592,pp:96-106
    [102] Cai D., Bai Y., Dong Y., and Liu L.. Chinese Question Classification Using Combination Approach. In Proceedings of the third International Conference on Semantics,Knowledge and Grid, 2007, pp:334-337.
    [103] Resnik P.. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1995.pp:448-453
    [104] Qun L., Sujian L.. Word Similarity Computing Based on Hownet. Computational Linguistics and Chinese Language Processing, 2002, 7(2):59-76
    [105]夏天.汉语词语语义相似度计算研究[J].计算机工程. 2007,33(6):191-194.
    [106]余刚,裴仰军,朱征宇等.基于词汇语义计算的文本相似度研究[J].计算机工程与设计,2006. 27(2):241-244
    [107]张晓孪,张蕾等.基于知识图的汉语词语间语义相似度计算.计算机工程与应用. 2007 43 (8): 160-163
    [108] Chen H.H., Lin M.S., and Wei Y.C.. Novel Association Measures Using Web Search with Double Checking. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, 2006, pp:1009-1016.
    [109] Bollegala D., Matsuo Y., and Ishizuka M.. An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web. In Proceedings of NAACL-HLT2007, 2007, pp: 340-347
    [110]章志凌.基于Corpus库的词语相似度计算方法.计算机应用,2006,26(3):638-640
    [111] Jinglei Zhao, Hui Liu, Maosheng Zhong, and Ruzhan Lu. Attribute-based Computing of Word Similarity[J]. Journal of Computational Information Systems,2008, 4(6):2509-2517.
    [112] Budanitsky A., Hirst G. Evaluating WordNet-based Measures of Lexical Semantic Relatedness[J]. Computational Linguistics. 2006,32(1):13-47.
    [113] Felbaum C. WordNet: an Electronic Lexical Database. Cambridge, Massachusetts, MIT Press, 1998
    [114] Mohammad S, Hirst G.. Distributional Measures as Proxies for Semantic Relatedness[EB/OL]. http://www.cs.toronto.edu/compling/Publications
    [115]陆宝益,李保珍.基于本体的检索质量的语义相关度评价[J].情报杂志. 2006.10:63-65
    [116]谭振华,程维,常桂然,高晓兴.基于词汇相关度模型的个性化信息检索算法[J].东北大学学报(自然科学版).2008,29(4):504-507
    [117]陈枭,刘天华,朱宏峰,刘骏.基于词汇相关度模型的个性化元搜索引擎[J].计算机工程与设计. 2008, 28(19):4758-4761
    [118]张柯,沈夏炯,董鑫,于俊洋.基于概念格的语义相关度计算[J].郑州轻工业学院学报(自然科学版). 2007, 22(2/3):178-181
    [119]田萱,杜小勇,李海华.信息检索中一种基于词语主题相关度的语言模型[J].中文信息学报. 2007, 21(6):43-50
    [120]田萱,杜小勇,李海华.语义查询扩展中词语-概念相关度的计算[J].软件学报. 2008, 19(8):2043-2053
    [121]张运良,张全.基于HNC理论的语义相关度计算方法[J].计算机工程与应用, 2005, 41(34):1-3,18
    [122]闫蓉.基于语义相关度计算的汉语词义消岐方法研究[J].内蒙古大学学报(自然科学版). 2007, 38(6):693-697
    [123]王广正,王喜凤.基于知网语义相关度计算的词义消岐方法[J].安徽工业大学学报(自然科学版), 2008, 25(1):71-75
    [124]许云,樊孝忠,张锋.基于知网的语义相关度计算[J].北京理工大学学报. 2005,25(5):411-414
    [125]徐南轩,邹恒明.一种反映词语相关度语义库的构建方法[J].上海交通大学学报. 2008,42(7):1129-1132
    [126]裘江南,罗志成,王延章.基于中文语义词典的语义相关度方法比较研究[J].情报理论与实践. 2008, 31(5):715-719
    [127] Miller G.A. and Charles W.G.. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes,1991,6 (1):1-28
    [128] Wiebe R. Pestman.. Mathematical Statistics:An Introduction. New York :Walter de Gruyter Press,1998
    [129] Jan C. A. van der Lubbe. Information Theory. London: Cambridge University Press, 2001
    [130] Christopher D.Manning, Hinrich Sch?tze. Foundations of Statistical Natural Language Processing. MIT Press. Cambridge, MA: May 1999
    [131] Maxwell, III, John T.. The problem with mutual information. Manuscript,Xerox Palo Alto Research Center, September 15,1992
    [132] Church, Kenneth W., and William A.Gale. Concordances for parallel text. In proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, 1991.pp.40-62,Oxford.
    [133] Sanfilippo A. and Poznański. The acquisition of lexical Knowledge from Combined Machine-Readable Dictionary Sources. Proceedings of the third conference on Applied Natural Language Processing, 1992. pp:80-87
    [134] Klavans J. and Tzoukermann E. Combining Corpus and Machine-Readable Dictionary Data for Building Bilingual Lexicons. Machine Translation. 1995,10(3):185-218
    [135]张德禄.语篇内部衔接的原则.解放军外国语学院学报. 2001, 24(6):26-32
    [136]周军平.词汇重复及其在语篇中的衔接功能.兰州商学院学报. 2004, 20(5):120-123
    [137]晓东.句子的衔接及其制约因素.上海师范大学学报. 1995年第4期,pp:90-93
    [138]廖福涛.篇章策略连续体与语篇的衔接机制.江西师范大学学报(哲学社会科学版).2002, 35(4)
    [139]张志群.图式知识与语篇衔接关系的建立.山西大学师范学院学报, 2002年第1期, pp:91-92
    [140]张德禄.语篇内部衔接的原则.解放军外国语学院学报. 2001, 24(6):26-32
    [141]孙玉.试论衔接与连贯的来源_本质及其关系.上海外国语大学学报. 1997年第1期, pp:31-35
    [142]马楠.现代汉语句段内的语义联系.黑龙江大学硕士学位论文. 2008年5月
    [143]郑庆君.句际关系研究的现状和前景.扬州大学学报(人文社会科学版). 2003, 7(6):48-53
    [144]张源中.句群间接引述中句际关系的处理.宁德师专学报(哲学社会科学版). 1999,48(1):76-88
    [145]沈开木,田树生.句段分析(超句体的探索).语文出版社. 1987.9
    [146]吴为章,田小琳.汉语句群.商务印书馆. 2002.4
    [147] Kim S. N. and Baldwin T. Interpreting Semantic Relations in Noun Compounds via Verb Semantics. Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, 2006, pp:491-498
    [148] Costello F.J., Veale T.,and Dunne S.. Using WordNet to Automatically Deduce Relations between words in Noun-Noun Compounds. Proceedings of the COLING/ACL on Main Conference Poster Sessions, 2006, pp:160-167
    [149] Agichtein, E.,and Gravano,S.. Snowball: Extracting relations from large plain-text collections. In Proceedings of the5th ACM International Conference on Digital Libraries. San Antonio, Texas, United States, 2000, pp:85-94
    [150] Brin,S. Extracting Patterns and Relations from the World Wide Web. In Proceedings of the International Workshop on the Web and Databases. Valencia, Spain,1998. 172-183.
    [151] Hu Yi, Lu Ruzhan, Chen Yuquan and Pei Bingzhen. Text Retrieval Oriented Auto-construction of Conceptual Relationship. In Proceedings of ICCS 2007, Lecture Notes in Computer Science, Springer-Verlag, 2007,Vol.4488, pp:1214-1217
    [152] Jagadeesh J., Prasad Pingali and Vasudeva Varma. A Relevance-Based Language Modeling Approach to DUC 2005. http://www-nlpir.nist.gov/projtcts/duc/duc2005/tasks.html
    [153]李素建.基于语义计算的语句相关度研究.计算机工程与应用. 2002.38(7):75-76,83
    [154]张友华,熊范纶.基于句子相关度的文本自动分类.中国科学技术大学学报. 2006. 36(5): 540-545
    [155] Ristad E. S., Yianilos P N.. Learning String Edit Distance. IEEE Transactions on Patter Analysis and Machine Intelligence, 1998, 20(5):522-532
    [156]吕学强,任飞亮,黄志丹,姚天顺.句子相似模型和最相似句子查找算法.东北大学学报(自然科学版),2003, 24(6):531-534
    [157]郎君,刘挺,秦兵.基于决策树的中文名词短语指代消解.第二届全国学生计算语言学研讨会论文集, 2004
    [158] Choi,F.Y.Y . Advances in domain independent linear text segmentation[A]. In Proceedings of the North American Chapter of the Association for Computational Linguistics [C]. Seattle, USA. 2000. 26-33.
    [159]钟彬彬,刘远超,徐志明.基于GA的文本子主题切分中的参数优化研究[J].计算机工程与应用. 2005,21:97-99
    [160]朱海军,张桂平,蔡东风,王炜华.知网在文本分割算法中的应用[C].中国计算技术与语言问题研究——第七届中文信息处理国际会议论文集, 2007, 448-453.
    [161] Pevener,I., Hearst,M. A Critique and Improvement of an Evaluation Metric for Text Segmentation[J]. Computational Linguistics.2002.28(1):19-36
    [162]高勇.基于TextTiling的中文文本分割技术.东北大学硕士学位论文. 2006
    [163]寇忠宝,张长水.基于Multi-Agent的分类器融合.计算机学报, 2003. 26(2):174-179
    [164] Davic W Mount. Bioinformatics: Sequence and Genome Analysis. USA: Cold Spring Harbor Laboratory Press,2002,53-54.
    [165]唐玉荣,汪懋华.基于动态规划的快速序列比对算法.生物数学学报. 2005, 20(2):207-212

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700