用户名: 密码: 验证码:
中文术语抽取若干问题研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
术语,作为专业知识的集中载体,它的创建、普及和消亡,动态展现了一个学科的发展、演变历程。专业术语数据库作为一种知识源,能够为各类研究人员便捷地获取专业知识提供重要支持。术语自动抽取是构建专业术语库的关键技术,同时也是自然语言处理领域中的一项基本课题,为包括机器翻译、文档摘要、信息检索、文本分类、词典编纂等在内的诸多自然语言研究起到支撑作用。
     本文突破了名词短语的限制,接纳更多非名词性结构的专业术语,拓宽了语言规则。结合实证分析和机器学习策略,分别从结构完整性、领域相关度和词语搭配三个方面展开研究,主要工作包括:
     1.以词为最小语言单位,构建一个涵盖四万余条计算机专业术语的数据库。针对不同长度术语的分布特性,结合机器学习方法从多角度提炼出术语结构的词法特征。丰富语言规则的同时,扩大了规则覆盖面,提高术语抽取的召回率。
     2.针对单词型术语结构简单,边界清晰的特征,提出一种基于模糊聚类的识别算法。将术语识别过程成功转化为二值分类任务,无需专业辞典和诸多语料库的支持,实现单词型术语的自动聚合标注。
     3.不同于已有方法中采用单一父串到多子串的归并策略,本文从单一子串与多父串之间的逆向映射关系出发,提出了一种基于独立性统计的子串归并算法,以此判断候选术语的结构完整性。实验表明,在O(n)的时间内,该算法不仅可以删除普通子串,还能有效过滤由公共子串造成的干扰,将候选术语集有效缩减29.44%。
     4.以非名词性词语的构词能力为研究对象,提出了“词汇活跃度”(Word Active Degree, WAD)的概念。同时结合词汇间粘合度,分析短语内部词语的搭配特征,过滤掉非良性搭配和局部成分过分活跃的短语。实验表明,采用WAD作为词语搭配评判标准,对由动宾短语和介词短语引发的错误有较强的识别能力,正确率高达99.97%。
     5.根据术语和非术语在语料库中变化趋势的分布差异性,结合局部及全局特征,提出了一种基于分布变化特征的领域相关度计算方法。实验表明,该方法不仅能够大幅降低计算复杂度,还可以显著提升低频术语和基础术语在输出结果中的排序。
As the carriers of domain knowledge, the creation, popularization and extinction of terms show the dynamical development and evolution process of a subject. Taking the part of knowledge source, domain term databases could offer a convenient and quick manner to acquire professional knowledge. Automatic term extraction is not only one of the critical technology of domain term database construction, but also a basic topic in nature language processing, and provides support with many other researches, such as machine translation, information retrieval, automatic abstrcting, text classification, dictionary compilation and so on.
     In this dissertation, the author makes a breakthrough at the restriction of noun phrases, accepts more different structures, and widens the linguistical rules. Combining with empirical analysis and machine learning strategies, the researches focus on term structure integrality, domain relevance and collocation, and get the following achievement:
     Firstly, a computer term database containing more than 40,000 items is constructed, which takes word as the minimal linguistical unit. Based on the distribution features of terms with different length, some morphological rules of term structure are concluded by machine learning methods. As the result of enriching the linguistical rules, the coverage of rules is enlarged and the recall is improved.
     Secondly, a single-word term recognition approach based on fuzzy clustering is proposed, according to the simple structure and the unambiguous boundary. The recognition process is turned into classification task. Dispensing with specific dictionaries and many other corpora, the single-word terms could be automatically tagged by the clustering algorithm.
     Thirdly, a substring reduction algorithm based on the independency statistic is proposed to estimate the structure integrality of candidates. Unlike the current methods adopting the mapping relations from single parent-string to many substring, this algorithm attempts to catch the links between a string with its parent-strings. Validated by the experiments,29.44%of the candidates are filtering in time. Besides of the ordinary fragmentary substrings, the common substring noisy can also be recognized.
     Fourthly, a conception of word active degree is proposed to evaluate the collocation ability of non-noun words. Integrated with cohesion between words, the parameter could measure the collocation appropriateness of the words in a phrase and delete the ill-collocation or the phrases with excessively active segment. Validated by the experiments, the WAD has a strong ability to distinguish the errors cased by verb-object phrases and preposition phrases, and the precision reaches 99.97%.
     Finally, according to the distribution diversity between terms and non-terms, a domain relevance measure based on the local distribution variety feature is proposed, combining with the whole coverage feature. Validated by the experiments, this method could efficiently improve the rank of low-frequency term and base term with low computational complexity.
引文
[I]Bourigault, D., C. Jacquemin, and M. C L'Homme. Recent Advances in Computational Terminology[M]. Amsterdam:John Benjamins Publishing Company.2001.
    [2]Youngja Park, Roy J Byrd, Branimir K Boguraev. Automatic Glossary Extraction: Beyond Terminology Identification[C]. In the Proceedings of the 19th international conference on Computational linguistics,2002, pp:1-7.
    [3]Irena Spasic, Goran Nenadic, and Sophia Ananiadou. Using Domain-Specific Verbs for Term Classification[C]. In the Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine,2003:17-24.
    [4]冯志伟.现代术语学引论[M].北京:语文出版社,1997.
    [5]梁爱林.论术语学概念理论的发展[J].术语标准化与信息技术.2003(4):4-10.
    [6]赵玉.试论科技汉语词汇的特点[J].术语标准化与信息技术.2006(3):21-24.
    [7]何燕,穗志方,段慧明等.基于专业术语词典的自动领域本体构造[J].情报学报,2007,26(1):
    [8]傅继彬,刘杰,贾可亮等.基于知网和术语相关度的本体关系抽取研究[J].现代图书情报技术,2008(9):36-40.
    [9]Gaoying Cui, Qin Lu, Wenjie Li. Preliminary Chinese Term Classification for Ontology Construction[C]. The Third International Joint Conference on Natural Language Processing(IJCNLP 2008),2008:631-636.
    [10]Wilson Wong, Wei Liu, and Mohammed Bennamoun. Determining Termhood for Learning Domain Ontologies using Domian Prevalence and Tendency[C]. In the Processing of Sixth Australasian Data Mining Conference (AusDM 2007),2007:47-54.
    [11]Blaz Fortuna, Nada Lavrac, and Paola Velardi. Advancing Topic Ontology Learning through Term Extraction[C]. In the Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence:Trends in Artificial Intelligence,2008:626-635.
    [12]Robert Neches, Richard Fikes, Tim Finin, et al. Enabling Technology for Knowledge Sharing[J].AIMagazing,1991,12(3):36-56.
    [13]Thomas R. Gruber. A Translation Approach to Potable Ontology Specification[J]. Knowledge Acquisition,1993,5(2):199-200.
    [14]William Swartout, Austin Tate. Guest Editors'Introduction:Ontologies[J]. IEEE Intelligent Systems,1999,14(1):18-19.
    [15]Starlab.2003. Systems Technology and Application Research Laboratory home page. Faculty of Sciences, Department of Computer Science, Vrije Universiteit Brussel. Available at:http://www.starlab.vub.ac.be/research/indexbody.htm
    [16]张普.流通度在IT术语识别中的应用分析——关于术语、术语学、术语数据库的研究[C].辉煌二十年——中国中文信息学会二十周年学术会议论文集.2001.
    [17]赵玉.试论科技汉语词汇的特点[J].术语标准化与信息技术.2006(3):21-24.
    [18]Kageura, K. and Umino, B. "Methods of Automatic Term Recognition" [C]. Papers of the National Center for Science Information Information Systems.1996. pp:1-22.
    [19]Thuy VU, Ai Ti AW, Min Zhang. Term Extraction Through Unithood and Termhood Unification[C]. The Third International Joint Conference on Natural Language Processing(IJCNLP 2008),2008:631-636.
    [20]Wong, W, Liu, W.& Bennamoun, M. (2008) Determination of Unithood and Termhood for Term Recognition[M]. Handbook of Research on Text and Web Mining Technologies.
    [21]李卫.领域知识的获取[D].北京邮电大学博士学位论文.2008.
    [22]Furnas, G. W., T. K. Laudaurer, L. M. Gomez, et al. Statistical Semantics:Analysis of the Potential Performance of Keyword Information Systems[J]. Bell System Technical Journal,1983,62(2):1753-1806.
    [23]Hinrich Schutze, Yoram Singer. Part-of-speech Tagging Using a Variable Memory Markov Model [C]. In the Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics(ACL'94),1994, pp.181-187.
    [24]Adwait Ratnaparkhi. A Maximum Entropy Model for Part-of-speech Tagging[C]. In the Proceeding of the Conference on Empirical Methods in Natural Language Processing. Somerset:Association for Computational Linguistics,1996, pp:133-142.
    [25]Thorsten Brants. TnT:a Statistical Part-of-speech Tagger[C]. In the Proceedings of the sixth conference on Applied natural language processing,2000, pp.224-231.
    [26]Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. Word Sense Disambiguation Using Statistical Methods[C]. In the Proceedings of the 29th Annual Meeting of Association for Computational Linguistics,1991, pp:264-270.
    [27]Rebecca Bruce, and Janyce Wiebe. Word-sense Disambiguation Using Decomposable Models[C]. In the Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics,1994, pp:139-145.
    [28]R. Mihalcea. Co-training and Self-training for Word Sense Disambiguation[C]. In the Proceedings of the Conference on Natural Language Learning,2004, pp:33-40.
    [29]David M. Magerman. Parsing as Statistical Pattern Recognition. IBM Technical Report No.19443, December 1993.
    [30]Wong Aboy, Wu Dekai. Are Phrase Structured Grammars Useful in Statistical Parsing[C]. In the Proceedings of 5th Natural Language Processing Pacific Rim Symposium,1999,pp:120-125.
    [31]Mi-Young Kim, Jong-Hyeok Lee. S-clause Segmentation for Efficient Syntactic Analysis Using Decision Tree. Available at: http://www.alta.asn.au/events/altss w2003 proc/altw/papers/kim-final.pdf
    [32]Brown P. F., Cocke J., Della S. A., et al. A Statistical Approach to Machine Translation[J]. Computational Linguistics,1990,16(2):19-85.
    [33]Brown P. F., Della S. A., Robert L. M., et al. The Mathematics of Statistical Machine Translation:Parameters Estimation[J]. Computational Linguistics,1993,19:263-311.
    [34]Josef F., Ney H. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation[C]. In the Proceeding of the 40th ACL,2002, pp:295-302.
    [35]Jim McDonald, William Ogden, and Peter Foltz. Interactive Information Retrieval Using Term Relationship Networks[C]. In the 6th Text REtrieval Conference(TREC-6), 1997,pp:379-384.
    [36]Jiri Dvorsky, Michal Kratky, Tomas Skopal, Vaclav Snasel. Term Indexing in Information Retrieval Systems[C]. In the Proceedings of Communications in Computing,2003, pp:263-270.
    [37]Philipp Mayr, Vivien Petras.(刘华梅译).交叉语词索引:术语映射及其对信息检索的影响[C].2008. Available at: http://archive.ifla.org/IV/ifla74/papers/129-Mayr Petras-trans-zh.pdf
    [38]刘庆荣.语料库与词典编纂[J].上海师范大学学报(哲学社会科学版),2001, 30(3).
    [39]王馥芳,罗敏莉.语料库词典学的兴起与发展[J].辞书研究.2004(5).
    [40]Cowie, Anthony P. Phraseology and Corpora:Some Implications for Dictionary-making[J]. International Journal of Lexicography,1999,12(4):307-323.
    [41]郑述谱.专科词典编纂的学科依托——术语学[J].辞书研究.2008(6).
    [42]Andrew McCallum, Kamal Nigam. A Comparison of Event Models for Naive Bayes Text Classification[C]. In AAAI/ICML-98 Workshop on Learning for Text Categorization,1998, pp:41-48.
    [43]Fabrizio Sebastiani. Machine Learning in Automated Text Categorization[J]. ACM Computing Surveys.2002,34(1):1-47.
    [44]Joachims T. Text Categorization whit Support Vector Machine:Learning with Many Relevant Features[C]. In the Proceedings of the Europe Conference on Machine Learning.1998, pp:137-142.
    [45]代六玲.互联网内容监管系统关键技术的研究[D]。南京理工大学博士学位论文.2005.
    [46]Man Lan, Chew-Lim Tan, Hwee-Boon Low, et al. A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with Support Vector Machines[C]. In the Proceedings of 14th International World Wide Web Conference. 2005,pp:1032-1033.
    [47]Samer Hassan, Rada Mihalcea, and Carmen Banea. Ramdom-Walk Term Weighting for Improved Text Classification[C]. International Conference on Semantic Computing, 2007, pp:242-249.
    [48]刘桃,刘秉权,徐志明等.领域术语自动抽取及其在文本分类中的应用[J].电子学报.2007,35(2):328-332.
    [49]Radev D. R., Hovy E., and McKeown, K. Introduction to the Special Issue on Summarization[J]. Computational Linguistic.2002,28(4):399-408.
    [50]HuanTong Geng, Peng Zhao, Enhong Chen, et al. A Novel Automatic Text Summarization Study Based on Term Co-Occurrence[C]. In the Proceedings of 5th IEEE International Conference on Cognitive Informatics,2006, pp:601-607.
    [51]Michael P. Oakes, Chris D. Paice. Term Extraction for Automatic Abstracting. Recent Advances in Computational Terminology[M].2001, pp:353-370.
    [52]王萌,李春贵,唐培和等.一种主题句发现的中文自动文摘研究[J].计算机工程.2007,33(8):180-181.
    [53]Christian Jacquemin. Recycling Terms into a Partial Parser[C]. In the Proceedings of NALP'94.1994, pp:113-118.
    [54]Christian Jacquemin. Syntagmatic and Paradigmatic Representations of Term Variation[C]. In the Proceedings of ACL'99.1999, pp:341-348.
    [55]Ido Dagan, and Ken Church. Termight:Identifying and Translating Technical Terminology[C]. In the Proceedings of the 4th Conference on Applied Natural Language Processing.1994, pp:34-40.
    [56]Lauriston Andy. Automatic Recognition of Complex Terms:Problems and the "TERMINO" Solution[J]. In Terminology:Applications in Interdisciplinary Communication.1994,1(1):147-170.
    [57]David Sophie, Pierre Plante. De la necessite d'une approche morphosyntaxique en analyse de textes[J]. Intelligence artificielle et sciences cognitives au Quebec.1990, 3(3):140-145.
    [58]John Justeson, and Slava Katz. Technical Terminology:some Linguistic Properties and an Algorithm for Identification in Text[J]. Natural Language Engineering.1995, 1(1):9-27.
    [59]Arppe A. Term Extraction from Unrestricted Text[C]. In the Proceedings of 10th Nordic Conference of Computational Linguistics.1995. Available at: http://www.lingsoft.fi/doc/nptool/term-extraction.html
    [60]Atro Voutilainen. NPtool, a Detector of English Noun Phrases[C]. In the Proceedings of the Workshop on Very Large Corpora.1993, pp:48-57.
    [61]Fred Karlsson. Constraint Grammar as a Framework for Parsing Running Text[C]. In the Proceedings of the 13th International Conference on Computational Linguistic. 1990,3:168-173.
    [62]Chengxiang Zhai, Xiang Tong, Natasa Milic-Frayling, et al. Evaluation of Syntactic Phrase Indexing CLARIT[C]. NLP track report. In the Proceedings of the TREC-5. 1996.
    [63]Heid Ulrich, Susanne Jauss, Katja Kruger, et al. Term Extraction with Standard Tools for Corpus Exploration:Experience from German[C]. In the Proceedings of the 4th International Congress on Terminology and Knowledge Engineering(TKE'96),1996, pp:139-150.
    [64]Didier Bourigault, Isabelle Gonzalez-Mullier, and Cecile Gros. Lexter, A Natural Language Processing Tool for Terminology Extraction[C]. In 7th EURALEX International Congress on Lexicography,1996, pp:771-779.
    [65]Elie Naulleau. Profile-guided Terminology Extraction[C]. In TKE'99:Terminology and Knowledge Engineering.1999, pp:222-240.
    [66]Chantal Enguehard, Laurent Pantera. Automatic Natural Acquisition of a Terminology[J]. Journal of Quantitative Linguistics,1994,2(1):27-32.
    [67]Beatrice Daille. Study and Implementation of Combined Techniques for Automatic Extraction of Terminology[C]. In the Proceedings of 32th Annual Meeting of the Association for Computational Linguistics,1994, pp:29-36.
    [68]Katerina T. Frantzi, Sophia Ananiadou. Extracting Nested Collocations[C]. In the Proceedings of 16th International Conference on Computational Linguistics (COLING'96),1996, pp:41-46.
    [69]Katerina T. Frantzi. Incorporating Context Information for the Extraction of Terms[C]. In the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics,1997,pp:501-503.
    [70]Katerina Frantzi, Sophia Ananiadou, Hideki Mima. Automatic Recognition of Multi-word Terms:the C-value/NC-value Method[J]. International Journal on Digital Libraries,2000,3(2):115-130.
    [71]Hideki Mima, Sophia Ananiadou, Goran Nenadic. The ATRACT Workbench: Automatic Term Recognition and Clustering for Terms[J]. Lecture Notes in Computer Science,2001,2166, pp:126-133.
    [72]Michael Krauthammer, Goran Nenadic. Term Identification in the Biomedical Literature [J]. Journal of Biomedical Informatics,2004,37(6):512-526.
    [73]Sophia Ananiadou, Goran Nenadic. Automatic Terminology Management in Biomedicine[M]. Text Mining for Biology and Biomedicine, Artech House Books, 2006, pp:67-98.
    [74]Hiroshi Nakagawa, Tatsunori Mori. Automatic Term Recognition Based on Statistics of Compound words and their Components[J]. Journal of Terminology,2003, 9(2):201-219.
    [75]Hiroshi Nakagawa, Hiroyuki Kojima, Akira Maeda. Chinese Term Extraction from Web Pages Based on Compound word Productivity[C]. In the Proceedings of 42nd Annual Meeting of the Association from Computational Linguistics(ACL'04),3rd SIGHAN Workshop on Chinese Language Processing,2004, pp:79-85.
    [76]Minoru Yoshida, Hiroshi Nakagawa. Automatic Term Extraction Based on Perplexity of Compound Words[C]. In the Proceedings of International Joint Conference on Natural Language Processing,2005, pp:269-279.
    [77]Christopher D. Manning, and Hinrich Schiitze. Foundations of Statistical Natural Language Processing[M]. Cambridge, Massachusetts:MIT Press,1999.
    [78]Uchimoto Kiyotaka, Sekine Satoshi, Murata Masaki, et al. Term Recognition Using Corpora from Different Fields[J]. Terminology,2000,6(2):233-256.
    [79]Fukushige Yoshio, Nogichi Naohiko. Statistical and Lingusitic Approaches to Automatic Term Recognition:NTCIR experiments at Matsushita. Terminology,2000, 6(2):257-286.
    [80]Hiroshi Nakagawa. Experimental Evaluation of Ranking and Selection Methods in Term Extraction[M]. In D. Bourigault, C. Jacquemin, and M.-C. L'Homme. Recent Advances in Computational Terminology, Amsterdam:John Benjamins,2001, pp:303-326.
    [81]Spela Vintar. Comparative Evaluation of C-value in the Treatment of Neated Terms[C]. In the Proceedings of Methodologies and Evaluation of Multiword Units in Real-world Applications(LREC 2004),2004, pp:54-57.
    [82]Alexandre Patry, Philippe Langlais. Corpus-Based Terminology Extraction[C]. In the Proceedings of the 17th International Conference on Terminology and Knowledge Engineering,2005, pp:313-321.
    [83]Stefan Evert, Brigitte Krenn. Methods for the Qualitative evaluation of lexical association measures[C]. In the Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics,2001, pp:188-195.
    [84]Joachim Wermter, Udo Hahn. Finding new Terminology in Very Large Corpora[C]. In the Proceedings of the 3rd International Conference on Knowledge Capture,2005, pp:137-144.
    [85]Ismail Fahmi, Gosse Bouma, Lonneke vd Plas.Improving Statistical Method Using Known Terms for Automatic Term Extraction. In Computational Linguistics in the Netherlands-CLIN 17.
    [86]Ziqi Zhang, Jose Iria, Christopher Brewster, et al. A Comparative Evaluation of Term Recognition Algorithms[C]. In the Proceedings of the 6th International Language Resources and Evaluation (LREC'08),2008, pp:2108-2113.
    [87]Jordi Vivaldi, Lluis Marquez, Horacio Rodriguez. Improving Term Extraction by System Comination using Boosting[C]. In the Proceedings of the 12th European Conference on Machine Learning,2001, pp:515-526.
    [88]Patrick Pantel, Dekang Lin. A Statistical Corpus-Based Term Extractor[C]. In the Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence:Advances in Artificial Intelligence.2001, pp:36-46.
    [89]Thuy VU, Ai Ti AW, Min ZHANG. Term Extraction Through Unithood and Termhood Unification[C]. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP-08),2008, pp:631-636.
    [90]Lee-Feng Chien. PAT-tree-based Keyword Extraction for Chinese Information Retrieval [C]. In the Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1997, pp:50-59.
    [91]J.F.D. Silva, G. Dias, J.GP. Lopes. A Local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units[C]. In the Proceedings of the 6th Meeting on Mathematics of Language,1999, pp:369-381.
    [92]Pu-Jen Cheng, Jei-Wen Teng, Ruei-Cheng Chen, et al. Translating Unknown Queries with Web Corpora for Cross-Language Information Retrieval [C]. In the Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2004, pp:146-153.
    [93]穗志方.信息科学技术领域术语自动识别策略[C].第二届中日自然语言处理专家研讨会第二届中日自然语言处理专家研讨会,2002.
    [94]Sui Zhifang, Chen Yirong, and Wei Zhouchao. Automatic Recognition of Chinese Scientific and technological Keyphrases Using Integrated Linguistic Knowledge[C]. IEEE Conference on Natural Language Processing and Knowledge Engineering,2003.
    [95]吴云芳,穗志方,邱利坤等.信息科学与技术领域术语部件描述[J].语言文字应。用,2003(4):34-39.
    [96]何燕,穗志方,段慧明等.一种结合术语部件库的术语提取方法[J].计算机工程与应用,2006(23):4-7.
    [97]张秦龙,穗志方,丁万松.术语自动提取中的领域度计算方法研究[C].第三届学生计算语言学研讨会,2006.
    [98]王强军.基于动态流通语料库(DCC)的信息技术领域新术语自动提取研究[D].北京语言大学博士学位论文,2003。
    [99]杜波,田怀凤,王立等.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,31(14):159-160.
    [100]何婷婷,张勇.基于质子串分解的中文术语自动抽取[J].计算机工程.2006,32(23):188-190.
    [101]胡文敏,何婷婷,张勇等.基于卡方检验的汉语术语抽取[J].计算机应用,2007,27(12):3019-3030.
    [102]姜韶华,党延忠.无词典中英文混合术语抽取及算法研究[J].情报学报,2006,25(3):301-305.
    [103]Le An Hua. Advances in Automatic Terminology Processing:Methodlogy and Application in Focus[D]. PhD Thesis of University of Wolverhampton.
    [104]李勇.基于聚类方法对特定领域术语的自动筛选[J].计算机工程与科学,2008,30(2):64-66.
    [105]李卫.领域知识的获取[D].北京邮电大学博士研究生学位论文,2008.
    [106]Hongying Zan, Guocheng Duan, Ming Fan. Single Word Term Extraction using a Bilingual Semantic Lexicon-based Approach[C]. In the Proceedings of 3rd International Conference on Natural Computation,2007, pp:451-456.
    [107]Jiangsheng Yu, Yang Liu, Shi wen Yu. The specification of the Chinese Concept Dictionary[J]. Journal of Chinese Language and Computing,2003,13(2):177-194.
    [108]Chantal Lemay, Marie-Claude L'Homme, P. D. Two Methods for Extracting "Specific" Single-word Terms from Specialized Corpora:Experimention and Evaluation[J]. International journal of corpus linguistics,2005,10(2):227-256.
    [109]Caroline Barriere, Akakpo Agbago. TerminoWeb:a software environment for term study in rich contexts[C]. In the Proceedings of International Conference on Terminology, Standardization and Technology Transfer,2006, pp:103-113.
    [110]Yijiang Chen, Chang Zhou, Xiaodong Shi. Automatic Extraction of Chinese Terms[C]. In the Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering,2005, pp:281-286.
    [111]韩客松,王永成,陈桂林.无词典高频字串快速提取和统计算法研究[J].中文信息学报,2001,15(2):23-30.
    [112]吕学强,张乐,黄志丹等.基于散列技术的快速子串归并算法[J].复旦学报(自然科学版),2004,43(5):948-951.
    [113]吕学强.面向机器翻译的E-Chunk获取与应用研究[D].东北大学.2003.
    [114]Kenneth Church, Patrick Hanks. Word Association Norms, Mutual Information, and Lexicography [J]. Computational Linguistical,1990(6):22-29.
    [115]Frank Smadja, Kathleen R. McKeown, Vasileios Hatzivassiloglou. Translating Collocations for Bilingual Lexicons:a Statistical Approach[J]. Computational Linguistics,1996,22(1):10-38.
    [116]车万翔,刘挺,秦兵,李生.面向依存文法分析的搭配抽取方法研究[C].全国第六届计算语言学联合学术会议,2001.
    [117]Kenneth W. Church, Robert L. Mercer. Introduction to the Special Issue on Computational Linguistics using Large Corpora[J]. Computational Linguistics, 1993(19):1-24.
    [118]Dunning Ted. Accurate Methods for the Statistics of Surprise and Coincidence [J]. Computational Linguistics,1993(19):61-74.
    [119]姜柄圭,张秦龙,谌贻荣,常宝宝.面向机器辅助翻译的汉语语块自动抽取研究[J].中文信息学报,2007,21(1):9-16.
    [120]Anne Condamines. Terminology:New needs, New Perspectives[J]. Terminology, 1995,2(2):218-238.
    [121]贾海鹰.科技术语构词趋势之探讨[J].术语标准化与信息技术,2003,Vo1.4,pp:33-36.
    [122]张榕.术语定义抽取、聚类与术语识别研究[D].北京语言大学博士论文.2006.
    [123]李芸.信息科学和信息技术术语概念体系研究[D].北京语言大学博士学位论文.2003.
    [124]刑红兵.信息领域汉语术语的特征及其在语料中的分布规律[J].术语标准化与信息技术,2000,No.3.
    [125]胡裕树.现代汉语[M].上海教育出版社.1995.
    [126]Bezdek J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. New York:Plenum Press,1981.
    [127]Pavel Pecina, Pavel Schlesinger. Combining Association Measures for Collocation Extraction[C]. In the Proceedings of COLING/ACL,2006, pp:651-658.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700