用户名: 密码: 验证码:
基于蒙古文语料库的人名自动识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
蒙古文人名的自动识别是命名实体识别的子任务之一。
     中、英文信息处理经历了半个世纪的发展,在基础资源的建设、词性标注、信息检索、文本分类、机器翻译、语言识别与合成、人机对话等领域都取得非常大的发展,中、英文信息处理的现代化发展,对国内少数民族语言信息处理的理论与技术发展也起到了深刻的促进作用。
     与中、英文信息处理相比,蒙古文信息处理虽然起步稍晚,但也取得了少数民族信息处理领域的辉煌成就。蒙古文信息处理已初步完成了字、词处理阶段,现已进入句处理阶段,蒙古文信息处理已完成短语结构关系识别、短语边界界定等浅层句法分析任务,正向深层句法分析迈进,蒙古文信息检索、自动文摘、文本分类、机器翻译的研究也方兴未艾。
     蒙古文词法分析与标注对短语、句法、语义、篇章的研究具有重要意义,不过作为基础环节的词法分析与标注,在未登录词,尤其是命名实体的识别研究未能繁荣发展。命名实体识别上的欠缺始终影响着词法分析的精度,并进而影响短语分析、句法分析、信息检索、机器翻译等领域的发展。
     专有名词是语料库的重要组成部分,专有名词识别技术的突破是提高蒙古文词法分析正确率及其他后续工作的重要基础,歧义和未登录词的识别是影响切分精度的两大障碍,未登录词包括新词和人名、地名等命名实体。本文作为蒙古文人名自动识别的研究成果,涉及普通人名及兼类人名的识别,因而我们的研究具有相当高的学术价值及应用价值。
     蒙古文本中人名数量众多,兼类现象较为普遍,研究蒙古人名的论述较少,尚无太多现成的理论与技术可供参考,因而蒙古文人名识别遇到很多难题,主要表现在:
     ☆人名是开放集合,无法采取穷举方法。蒙古族人名兼类现象较为严重,越普通的词,成为人名的现象也越普遍,名词、动词、形容词、数词、时间词、副词、代词、模拟词都能成为人名,这给人名识别带来很大困难。
     ☆蒙古文深加工语料库规模比起中、英文规模尚小,这必定影响到统计方法的运用。目前内蒙古大学已储备了200万词规模深加工语料库,而我们使用26万词规模语料库,语料库的规模使规则提取及机器学习受到一定限制。
     ☆专有名词的识别一直是蒙古文词法分析与标注的难点问题,但人名易与地名及其他专有名词兼类,因而专有名词之间的兼类问题也是困扰我们的难点问题。
     本文采用了最大熵的统计方法识别蒙古文人名,在传统的规则为主的研究基础上,将最大熵的数学模型成功应用于蒙古文命名实体的识别当中,实现了蒙古文人名自动识别系统。本文的创新和贡献主要体现在:
     ◇首次建立了蒙古文人名识别语料库
     目前,蒙古文语料库已具备了一定的规模,这对蒙古文信息处理的繁荣发展起到良好的推动作用。不过迄今为止,国内外还没有建立专门面向蒙古文人名识别的语料库,我们从网络抓取5773个蒙古文人名句,与内蒙古大学的语料库一同训练识别模型,测试自动识别的结果,有效补充了语料库缺乏带来的缺憾。
     ◇系统研究了蒙古族人名的内外部结构
     我们深入分析了蒙古人名的民族特征、时代特征、地域特征、性别特征,深入总结了蒙古文人名的内部组成模式,对蒙古族人名的结构类型及特点,对蒙古族特有的蒙古姓氏及其来源进行解读。
     ◇提出了蒙古文语料库标注及转写规范
     我们在对蒙古文语料库的标注现状进行分析的基础上,提出了,“语料库用现代蒙古语标注规范”,并针对汉语人名标注的诸多问题,以蒙古文标注外来词的固定习惯为基础,以《现代蒙古语语料库标注规范》为参考,提出了详尽的“汉语人名的拉丁转写方案”。
     ◇建立人名识别的知识库
     我们为自动识别蒙古文人名,建立了包括“汉语姓氏词典、蒙古姓氏词典、蒙古族普通人名词典、汉语姓氏拉丁映射表、汉语人名拉丁映射表、梵藏满人名词典、著名人物词典、人名指示词库、地名词典、地名后缀词典、机构名后缀词典”等词典或映射表的普通人名知识库,建立了包含“兼类人名词典、兼类词搭配词典、蒙古人名词干词典”等知识的兼类人名知识库。
     ◇设计并实现了蒙古文人名自动识别系统
     实验证明,作为国内外在蒙古文命名实体识别中较早运用统计方法的学术成果,本研究封闭测试的正确率94.56%,召回率85.15%,F值89.61%,取得了较为满意的识别效果。
The automatic recognition of Mongolian names is one of the subtasks of the named entity recognition.
     With the half-century development of the Chinese and English information processing, great progress has achieved in the fields of the construction of basic resources, POS tagging, information retrieval, text categorization, machine translation, speech recognition and synthesis, and man-machine dialog. The modernization of the Chinese and English information processing greatly stimulates the theoretical and technological development of minority language information processing in China.
     Compared with the Chinese and English information processing, the Mongolian information processing started relatively late, but it has obtained the distinctive scientific payoffs in minority language information processing. The Mongolian information processing has accomplished the processing of characters and words and entered into the stage of sentences processing. After finishing the tasks of the superficial lexical analysis of phrase structure's relation identification and phrase boundary defining, the Mongolian information processing is stepping forward to the deep lexical analysis. At the same time, the research of Mongolian information retrieval, automatic summarization, text categorization and machine translation is still growing.
     Mongolian lexical analysis and tagging is the basic scientific research in Mongolian information processing and places a great value on the research of phrases, syntax, semantics and texts. However, as a base, the lexical analysis and tagging fails to achieve equal progress in the study of unknown words and the named entity recognition in particular. The underdevelopment of named entity recognition influences the accuracy of the lexical analysis and thereby influences the development of phrase analysis, syntactic analysis, information retrieval and machine translation.
     Since the proper noun is an important part of the corpus, the breakthrough of the recognition of proper nouns is the foundation of the improvement of the accuracy of Mongolian lexical analysis and other follow-up studies. Ambiguity and the unknown words are the two greatest obstacles affecting the accuracy of segment, Here unknown words refer to the neologism and the named entity including names of people and places. As the fruit of the automatic recognition of Mongolian names, the present paper involves the name recognition among the unknown words, and multi-category name recognition, so that there is great academic and application value in our study.
     Owing that there is a great amount of names in Mongolian texts, most of which are multi-category words, and there are few studies in Mongolian names, which give little ready-make theoretical and technological reference for us, many challenges lie in the study of Mongolian name recognition, among which are as follows,
     ☆Name is an open collection, so we cannot adopt the exhaustive method. In Mongolian, the more common the word is, the likelier it will be taken as a name; nouns, verbs, adjectives, numerals, temporal words, adverbs, pronouns and mimetic words, any part of speech can be taken as a name. Since there is a critical multi-category phenomenon in Mongolian names, there is great difficulty in name recognition.
     ☆The scale of the intensive processing corpus is much smaller than that of the Chinese and English, which will surely influence the application of statistical method. Though there is a2-million words intensive processing corpus in Inner Mongolia University, the author only got the access to the260-thousand words corpus. The much smaller corpus limits the rule extraction and machine learning.
     ☆Recognition of proper nouns has been a difficulty in Mongolian lexical analysis and tagging, and, since the names of people often converse with the names of places and other proper names, the multi-category of proper nouns also becomes a difficult point for us.
     The present paper employs a method with statistics to identify the Mongolian names. Based on the traditional rules, it successfully applies the mathematical model of maximum entropy to Mongolian named entity recognition and realizes the automatic recognition of Mongolian names. The innovation and contribution of the present paper lie as follows,
     O Setting up the Mongolian name recognition corpus for the first time
     At the present time, though the expansion of the scale of Mongolian corpus pushes forward the development of the Mongolian information processing, there is still no Mongolian name recognition corpus home and abroad. The author picked up5,773sentences which contained Mongolian names to train the recognition model and test the result of automatic recognition along with the corpus in Inner Mongolia University, which made up the immature of the corpus.
     ◇Systematically researching the internal and external structures of Mongolian names
     The author penetrates into the ethnical, regional, era and gender characteristics in Mongolian names, summarizes the internal composing models of Mongolian names, explains both the structure types and their features of changes of Mongolian names and the specific Mongolian surnames and their origins, and lists the Chinese surnames of Mongolian people.
     ◇Formulating the tagging of Mongolian corpus and transliteration specification
     Based on the analysis of the current tagging of Mongolian corpus, the author puts forward the Contemporary Mongolian Tagging Specification for Corpus. To solve various problems in the tagging of Chinese names, a detailed Latin Transliteration Schemes for the Chinese Names is formulated, which is based on the regular practice of tagging of loan words in Mongolian and taken reference from Specifications for Contemporary Mongolian Corpus Annotation.
     ◇Setting up the knowledge base for name recognition
     To identify the Mongolian names automatically, the author sets up the knowledge bases of common names for dictionaries or mapping tables including Chinese Surnames Dictionary, Mongolian Surnames Dictionary, Dictionary of Mongolian Common Names, Latin Mapping Table for Chinese Surnames, Latin Mapping Table for Chinese Names, Dictionary of Sanskrit, Tibetan&Manchu Names, Famous Names Dictionary, Word Bank of Name Deixis, Suffix Dictionary for Place Names, Suffix Dictionary for Organization Names, and knowledge bases of multi-category names including Multi-category Names Dictionary, Collocation Dictionary for Multi-category Words, Stem Dictionary for Mongolian Names.
     ◇Designing and Realizing the Automatic Recognition System for names with maximum entropy
     The experiment proves that, as a pioneer home and abroad in the application of statistical method in Mongolian named entity recognition, the accuracy of the method adopted in the paper reaches94.56%, recall rate85.15%and F-value89.61, which represents the high efficiency in recognition.
引文
② 冯志伟:《计算语言学基础》,北京:商务印书馆.2001
    ③ 中华人民共和国机械电子工业部:《汉语信息处理词汇部分:基本术语》(GB12200.1-90),北京:中国标准出版社.1991
    ① 蒙古文小写字母的标注主要参考内蒙古大学蒙古学研究院蒙古语文研究所编的《蒙汉词典》,下同。
    ① 黄昌宁,赵海:《中文分词十年回顾》,中文信息学报,2007(3)8-19
    ① 郑家恒:《智能信息处理:汉语语料库加工技术及应用》,北京:科学出版社,2010
    ② 吴金星:《蒙古语词法标注语料库的构建及相关技术研究》,内蒙古大学硕士学位论文,2011
    ① 那顺乌日图,陈玉忠:《关于面向信息处理的蒙古语规范化问题》,中国少数民族多语种信息处理研究与发展,呼和浩特,2004,12-14
    ② 那顺乌日图:前文.12
    ③ 李宇明:《“2006中国语言生活状况”新闻发布实录》,语言信息,2007(8)1-5
    ① 雪艳:《汉蒙词语对齐及相关技术研究》,内蒙古大学博士学位论文,2009
    ① 达胡白乙拉:《蒙古语基本动词短语自动识别研究》,内蒙古大学博士学位论文,2005
    ① 那顺乌日图:《蒙古文词根词干词尾自动切分系统》,内蒙古大学学报(人文社会科学版),1997(2)53-57
    ② 华沙宝:《蒙古文语料库的词类标注系统—AYIMAG》,内蒙古大学学报1999(5)
    ③ 王斯日古楞:《蒙古语单词词性自动识别研究》,内蒙古师范大学学报(自然科学版,2007(3)319-321
    ④ 那日松,淑琴:《蒙古文词干还原系统设计和研究中的若干问题》.第十一届全国少数民族语言文字处理学术研讨会论文集,拉萨,2009,149-154
    ⑤胡冠龙,张建,李淼:《改进的基于转换方法的拉丁蒙文词性标注》,计算机应用.2007(4)963-965
    ⑥ 叶嘉明:《基于规则的蒙古语词法分析研究与实现》,北京大学硕士学位语文,2005
    ① 赵伟:《条件随机场在蒙古语词切分中的应用》,内蒙古大学硕士学位论文,2009
    ② 丛伟:《基于层叠隐马尔可夫模型的蒙古语词切分系统的研究》,内蒙古大学硕士学位论文,2009
    ③ 李文,张建,李淼:《一种带权值参数的非监督式形态切分方法》,第三届全国少数民族青年自然语言处理学术研讨会论文集,新疆大学,2010,30-33
    ④ 应玉龙:《汉蒙词法分析及其在统计机器翻译中的应用》,中国科技技术大学硕士学位论文,2010
    ⑤ 姜文斌,吴金星,乌日力嘎等:《蒙古语有向图形态分析器的判别式词干词缀切分》,中文信息学报,2011(4)30-34
    ① 宋美娜:《基于词缀特征的汉蒙统计机器翻译系统》,内蒙古大学硕士学位论文,2010
    ② 侯宏旭,刘群,那顺乌日图等:《基于统计语言模型的蒙古文词切分》,模式识别与人工智能,2009(1)108-112
    ③ 图格木勒:《蒙古语语言资源建设相关技术研究》,内蒙古大学硕士学位论文,2007
    ① 仁钦卡瓦:《蒙古族部分姓氏来源试析》,蒙古历史语文,1958(3)
    ② 明安特·沙·东希格:《蒙古族姓氏大全》,沈阳:辽宁民族出版社,2009
    ③ 奥都高德·博·苏达那木道尔吉: 《蒙古族姓氏研究》,沈阳:辽宁民族出版社,1993
    ④ 官其格著:《蒙古族姓氏》,海拉尔:内蒙古文化出版社,1993
    ⑤ 桑皮乐多诺日布:《镶黄旗蒙古姓氏调查》,内蒙古社会科学,1997(5)
    ⑥ 曹纳木,乌恩整理:《蒙古族姓氏集》,呼和浩特:内蒙古人民出版社,2007
    ⑦ 额德:《漫说蒙古族汉式性》,内蒙古日报,1994.5.14
    ⑧ 额德:《漫说蒙古族汉式性》,内蒙古日报,1993.10.23
    ⑨ 胡达古拉:《巴林蒙古族人名研究》,内蒙古师范大学硕士学位论文,2010
    ⑩ 萨楚日:《鄂尔多斯人名变化研究》,内蒙古大学硕士学位论文,2009
    [11] 图雅: 《科尔沁蒙古族人名》,内蒙古师范大学硕士学位论文,2007
    [12] 格根塔娜:《苏尼特左旗蒙古族人名研究》,内蒙古大学硕士学位论文,2007年
    [13] 包阿如娜:《奈曼旗蒙古族人名命名特征与变化研究》,内蒙古大学硕士学位论文,2010
    [14] [蒙]赞巴拉苏荣: 《蒙古人的藏语名》,蒙古学资料与情报,1988(3)37-39
    [15] 诺木:《蒙古人藏族名列举》,蒙古语文,1998(2)48-54
    [16] 彭斯克:《蒙古人满语名解》,内蒙古日报,1981.3.31日版
    ① [日]小林高四郎:《蒙古族的姓氏和亲属称谓》,蒙古学资料与情报,1987(6)16-22
    ② 哲·斯日吉:《蒙古人名意义》,蒙古语文,1995(6)33-43
    ③ 哲·斯日吉:《蒙古人名意义》(续),蒙古语文,1995(7)52-59
    ④ 赵琳瑛:《基于隐马尔可夫模型的中文命名实体识别研究》,西安电子科技大学硕士学位论文,2008
    ⑤ Scott Miller, Crystal, Heidi Fox, et al:Algorithms that Learn to Extract Information BBN:Description of the sift system as used for MUC-7, In Proceedings of seventh Message Understanding Conference.1998
    ⑥ Sekinenyu:Description of the Japanese NE System Used for MET-2, In Proceedings of seventh Message Understanding Conference.1998
    ⑦ Isozaki, H. Kazawa:Efficient Support Vector Classifiers for Named Entity Recognition, In Proceedings of COING,2002,1-7
    ⑧ Brill E:Transform-based Error-Driven Learning and Natural Language Processing:A Case in Part-of-speech Tagging, Computational Linguistics,1995,21(4):543-565
    ① 那顺乌日图,雪艳,淑琴:《蒙古文人名自动识别研究》,全国第七届计算语言学联合学术会议,黑龙江:哈尔滨,2003,123-127
    ② 张丽静: 《规则与统计结合的兼类词处理机制》,大连理工大学硕士学位论文,2002
    ③ Lesk, Michael:Automated Sense Disambiguation Using Machine-readable Dictionaries:How to Tell a Pine Cone from an Ice Cream Cone. Proceedings of the 1986 SIGDOC Conference, Toronto, Canada, June 1986, pp.24-26.
    ④ David Yarowsky:Word sense disambiguation using statistical models of Roget's categories trained on large corpora. Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France,3-28 August,1992, pp.454-460.
    ⑤ George A. Miller, Richard Beckwith, Christiane Fellbaum, etc:WordNet:An on-line lexical database, International Journal of Lexicography,3(4),1990, pp.235-244.
    ① W. K Gale, Church, D. Yarowsky:one sense per discourse, Proceedings of the DARPA Speech and Natural Language Workshop,1992, pp.233-237
    ② T. Joachims:Text Categorization with Support Vector Machines:Learning with many Relevant Features, In Proceedings of the European Conference on Machine Learning, Berlin,1998, pp.137-142
    ③ Armando Suarez Manuel Palomar:Maximum Entropy-based Word Sense Disambiguation system, COLING'02 Proceedings of the 19th international conference on Computational linguistics, 2002,1-7
    ④ Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra:Word-sense disambiguation using statistical mathods. Proceeding of the 29th Meeting of the Association for Computational Linguistics, Berkeley 1991, pp.264-270
    ⑤ David Yarowsky:Unsupervised word sense disambiguation rivaling supervised methods, Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 26-30 June 1995 Cambridge, Massachusetts,1995 pp.189-196.
    ⑥ 那顺乌日图,雪艳,叶嘉明:《<现代蒙古语语料库加工技术的新进展-新一代蒙古语词语自动切分与标注系统>(Darhan Tagging System)》,第十届全国少数民族语言文字信息处理技术学术研讨会论文集,2005,122-127
    ⑦ 那日松,敖其尔:《蒙古语兼类词词性标注的处理》, 蒙古学集刊,2004(3)
    ⑧ 淑琴:《蒙古语同形词知识库的构建》,内蒙古大学博士学位论文,2010
    ⑨ 那顺乌日图,雪艳,淑琴: 《蒙古文人名自动识别研究》,全国第七届计算语言学联合学术会议.黑龙江:哈尔滨,2003,123-127
    ① 那顺乌日图,雪艳,淑琴:《蒙古文人名自动识别研究》,全国第七届计算语言学联合学术会议,哈尔滨,2003,123-127
    ① 内蒙古大学蒙古语文研究室:《蒙汉词典》,呼和浩特:内蒙古大学出版社,1999
    ② 淑琴:《<蒙古语语法信息词典构形附加成分分库>的设计与实现》,内蒙古大学硕士学位论文,2005
    ① 特木尔巴根,吴灵芝:《从蒙古族人名变迁看民族文化的融合》,中国民族,2010(12)54-55
    ② 天峰,《蒙古语词语的文化研究》,沈阳:辽宁民族出版社,2009年
    ③ 照·特木尔:《蒙古语词汇研究》,呼和浩特:内蒙古教育出版社,2004
    ① 蒙古族简史修订本编写组编:《蒙古族简史》,北京:民族出版社,2009年
    ② 齐心:《蒙古人名论析》,解放军外语学院学报,1998(5)
    ① 徐俊元,张占军,石玉新:《贵姓何来》,石家庄:河北科学技术出版社,1985,4-14
    ① 国家技术监督局:国家标准GB13715,《信息处理用现代汉语分词规范》,中国标准出版社,1993
    ② 黄居仁等:《[资讯处理用中文分词规范]设计理念及规范内容》,语言文字应用,1997(1)94-102
    ③ 俞士汶等:《北京大学现代汉语语料库基本加工规范》,中文信息学报,2002,(5)49-64
    ④ 俞士汶等:《北京大学现代汉语语料库基本加工规范(续)》,中文信息学报,2002(6)58-64
    ⑤ 许嘉璐,傅永和主编:《信息处理用现代汉语分词词表规范》(征求意见稿),中文信息处理现代汉语词汇研究,广州:广东教育出版社,2006年9月,1-41
    ① 吴金星,长青:《蒙古语语料库基本加工规范初探》,第三届全国少数民族青年自然语言处理学术研讨会论文集,2010年6月,21-25
    ① 刘源等:《信息处理用现代汉语分词规范及自动分词方法》,北京:清华大学出版社,1994
    ② 申晓亭:《少数民族文字拉丁转写的意义与方案》,民族语言文字信息技术研究-第十一届全国民族语言文字信息学术研讨会论文集,2007,271-275
    ③ 秀华:《蒙古语拉丁转写问题》,内蒙古社会科学,2010(4)28-36
    ① 牧仁高娃:《蒙古语语料库标注及相关对策研究》,内蒙古大学硕士学位论文,2008,41
    ① 乔永波:《规则与统计相结合的中文命名实体识别》,山东大学硕士学位论文,2007
    ② 中国社会科学院语言文字应用研究所整理研究室编:《姓氏人名用字分析统计》,北京:语文出版社,1991
    ① [蒙]赞巴拉苏荣:《蒙古人的藏语名》,蒙古学资料与情报,1988(3)37-39
    ② 苏诺木:《蒙古人藏族名列举》,蒙古语文,1998(2)48-54
    ① 胡明扬:《词类问题考察》,北京:北京语言学院出版社,1996
    ② 陆俭明:《关于词的兼类问题》,中国语文,1994(1)28-34
    ③ 郑家恒.《智能信息处理:汉语语料库加工技术及应用》.北京:科学出版社,2010
    ① 陈乃雄:《蒙文同形词》,呼和浩特:内蒙古教育出版社,1982
    ① 德.青格乐图.《现代蒙古语固定短语语法信息词典详解》,呼和浩特:内蒙古教育出版社,2005
    ① Andrew borthwick:A Maximum Entropy Approach to Named Entity Recognition, Computer Science Department, New York University.1999
    ② Adam L Berger:Maximum Entropy Approach to Natural Language Processing,1996 Assosiation for Computational Linguistics,2011-9-11, http://www.doc88.com/p-079194692892.html. 2012-9-27
    ③ Adwait Ratnaparkhi:A Maximum Entropy Part-Of-Speech Tagger, In Proceedings ofthe Empirical Methods in Natural Language Processing Conference, May 17-18,1996
    ④ Adwait Ramaparkhi:Maximum Entropy Models For Natural Language Ambiguity Resolution, Ph. D Thesis, University Of Pennsylvania,1998
    ⑤ 周雅倩:《最大熵方法及其在自然语言处理中的应用》,复旦大学博士学位论文,2005
    [1]包阿如娜:《奈曼旗蒙古族入名命名特征与变化研究》,内蒙古大学硕士学位论文,2010
    [2]曹纳木,乌恩整理:《蒙古族姓氏集》,呼和浩特:内蒙古人民出版社,2007
    [3]陈乃雄:《蒙文同形词》,呼和浩特:内蒙古教育出版社,1982
    [4]丛伟:《基于层叠隐马尔可夫模型的蒙古语词切分系统的研究》,内蒙古大学硕士学位论文,2009
    [5]达胡白乙拉:《蒙古语基本动词短语自动识别研究》,内蒙古大学博士学位论文,2005
    [6]戴红亮,陈敏:《少数民族语言文字的标准化和信息化建设》,2009-06-03,http://www.seac.gov.cn/gjmw/zt/2009-06-03/1243138871380286.htm,2012-6-12
    [7]德.青格乐图.《现代蒙古语固定短语语法信息词典详解》,呼和浩特:内蒙古教育出版社,2005
    [8]额德: 《漫说蒙古族汉式性》,内蒙古日报,1994.5.14
    [9]额德: 《漫说蒙古族汉式性》,内蒙古日报,1993.10.23
    [10]冯志伟:《计算语言学基础》,北京:商务印书馆.2001
    [11]格根塔娜: 《苏尼特左旗蒙古族人名研究》,内蒙古大学硕士学位论文,2007年
    [12]官其格著:《蒙古族姓氏》,海拉尔:内蒙古文化出版社,1993
    [13]侯宏旭,刘群,那顺乌日图等:《基于统计语言模型的蒙古文词切分》,模式识别与人工智能,2009(1)108-112
    [14]胡冠龙,张建,李淼: 《改进的基于转换方法的拉丁蒙文词性标注》,计算机应用,2007(4)963-965
    [15]胡达古拉:《巴林蒙古族人名研究》,内蒙古师范大学硕士学位论文,2010
    [16]胡明扬:《词类问题考察》,北京:北京语言学院出版社,1996
    [17]华沙宝:《蒙古文语料库的词类标注系统—AYIMAG)),内蒙古大学学报1999(5)
    [18]黄昌宁,赵海:《中文分词十年回顾》,中文信息学报,2007(3)8-19
    [19]黄居仁等:《[资讯处理用中文分词规范]设计理念及规范内容》,语言文字应用,1997
    (1)94-102
    [20]国家技术监督局:国家标准GB13715,《信息处理用现代汉语分词规范》,中国标准出版社,1993
    [21]国务院:《国家中长期科学与技术发展规划纲要》,2006-2-9, http://news.xinhuanet.com/ politics/2006-02/09/content_4156347.htm,2013-2-1
    [22]姜文斌,吴金星,乌日力嘎等:《蒙古语有向图形态分析器的判别式词干词缀切分》,中文信息学报,2011(4)30-34
    [23]李宇明:《“2006中国语言生活状况”新闻发布实录》,语言信息,2007(8)1-5
    [24]李文,张建,李淼:《一种带权值参数的非监督式形态切分方法》,第三届全国少数民族青年自然语言处理学术研讨会论文集,新疆大学,2010,30-33
    [25]刘源等:《信息处理用现代汉语分词规范及自动分词方法》,北京:清华大学出版社,1994
    [26]陆俭明:《关于词的兼类问题》,中国语文,1994(1)28-34
    [27]罗智勇,宋柔:《一种基于可信度的人名识别方法》,中文信息学报2005(3)67-72
    [28]蒙古族简史修订本编写组编:《蒙古族简史》,北京:民族出版社,2009年
    [29]明安特·沙·东希格:《蒙古族姓氏大全》,沈阳:辽宁民族出版社,2009
    [30]牧仁高娃:《蒙古语语料库标注及相关对策研究》,内蒙古大学硕士学位论文,2008,41
    [31]那顺乌日图:《蒙古文词根词干词尾自动切分系统》,内蒙古大学学报(人文社会科学版),1997(2)53-57
    [32]那顺乌日图,陈玉忠:《关于面向信息处理的蒙古语规范化问题》,中国少数民族多语种信息处理研究与发展,呼和浩特,2004,12-14
    [33]那顺乌日图,何正安,青格乐图等:《信息处理用蒙古文词语标记》,2013-3-30,http://www.doc88.com/p-059288780399.html,2013-05-18
    [34]那顺乌日图,雪艳,淑琴:《蒙古文人名自动识别研究》,全国第七届计算语言学联合学术会议,黑龙江:哈尔滨,2003,123-127
    [35]那顺乌日图,雪艳,叶嘉明:《<现代蒙古语语料库加工技术的新进展-新一代蒙古语词语自动切分与标注系统>(Darhan Tagging System)》,第十届全国少数民族语言文字信息处理技术学术研讨会论文集,2005,122-127
    [36]那日松,敖其尔:《蒙古语兼类词词性标注的处理》, 蒙古学集刊,2004(3)
    [37]那日松,淑琴: 《蒙古文词干还原系统设计和研究中的若干问题》,第十二届全国少数民族语言文字处理学术研讨会论文集,拉萨,2009,149-154内蒙古大学蒙古语文研究室:《蒙汉词典》,呼和浩特:内蒙古大学出版社,1999
    [38]奥都高德·博·苏达那木道尔吉:《蒙古族姓氏研究》,沈阳:辽宁民族出版社,1993
    [39]彭斯克: 《蒙古人满语名解》,内蒙古日报,1981.3.31日版
    [40]齐心:《蒙古人名论析》,解放军外语学院学报,1998(5)
    [41]乔永波:《规则与统计相结合的中文命名实体识别》,山东大学硕士学位论文,2007
    [42]仁钦卡瓦: 《蒙古族部分姓氏来源试析》,蒙古历史语文,1958(3)
    [43]萨楚日: 《鄂尔多斯人名变化研究》,内蒙古大学硕士学位论文,2009
    [44]桑皮乐多诺日布: 《镶黄旗蒙古姓氏调查》,内蒙古社会科学,1997(5)
    [45]山西大学:《973当代汉语文本语料库分词、词性标注规范》,2010-9-11,wenku.baidu.com/view/...2d36ff3.html,2012-10-23
    [46]申晓亭:《少数民族文字拉丁转写的意义与方案》,民族语言文字信息技术研究-第十一届全国民族语言文字信息学术研讨会论文集,2007,271-275
    [47]宋美娜: 《基于词缀特征的汉蒙统计机器翻译系统》,内蒙古大学硕士学位论文,2010
    [48]淑琴:《蒙古语同形词知识库的构建》,内蒙古大学博士学位论文,2010
    [49]淑琴:《<蒙古语语法信息词典构形附加成分分库>的设计与实现》,内蒙古大学硕士学位论文,2005
    [50]苏诺木: 《蒙古人藏族名列举》,蒙古语文,1998(2)48-54
    [51]特木尔巴根,吴灵芝:《从蒙古族人名变迁看民族文化的融合》,中国民族,2010(12)54-55
    [52]天峰,《蒙古语词语的文化研究》,沈阳:辽宁民族出版社,2009年
    [53]图格木勒:《蒙古语语言资源建设相关技术研究》,内蒙古大学硕士学位论文,2007
    [54]图雅:《科尔沁蒙古族人名》,内蒙古师范大学硕士学位论文,2007
    [55]吴金星:《蒙古语词法标注语料库的构建及相关技术研究》,内蒙古大学硕士学位论文,2011
    [56]吴金星,长青:《蒙古语语料库基本加工规范初探》,第三届全国少数民族青年自然语言处理学术研讨会论文集,2010年6月,21-25
    [57]香港城市大学:《切词规则》,2012.10.19,http://www.doc88.com/p-242838463477.html, 2012.11.28
    [58]秀华:《蒙古语拉丁转写问题》,内蒙古社会科学,2010(4)28-36
    [59]许嘉璐,傅永和主编:《信息处理用现代汉语分词词表规范》(征求意见稿),中文信息处理现代汉语词汇研究,广州:广东教育出版社,2006年9月,1-41
    [60]徐俊元,张占军,石玉新:《贵姓何来》,石家庄:河北科学技术出版社,1985,4-14
    [61]雪艳: 《汉蒙词语对齐及相关技术研究》,内蒙古大学博士学位论文,2009
    [62]叶嘉明:《基于规则的蒙古语词法分析研究与实现》,北京大学硕士学位论文,2005
    [63]应玉龙:《汉蒙词法分析及其在统计机器翻译中的应用》,中国科技技术大学硕士学位论文,2010
    [64]俞士汶等:《北京大学现代汉语语料库基本加工规范》,中文信息学报,2002,(5)49-64
    [65]俞士汶等:《北京大学现代汉语语料库基本加工规范(续)》,中文信息学报,2002(6)58-64
    [66]王斯日古楞:《蒙古语单词词性自动识别研究》,内蒙古师范大学学报(自然科学版),2007(3)319-321
    [67][日]小林高四郎: 《蒙古族的姓氏和亲属称谓》,蒙古学资料与情报,1987(6)16-22
    [68][蒙]赞巴拉苏荣: 《蒙古人的藏语名》,蒙古学资料与情报,1988(3)37-39
    [69]张丽静:《规则与统计结合的兼类词处理机制》,大连理工大学硕士学位论文,2002
    [70]赵伟:《条件随机场在蒙古语词切分中的应用》,内蒙古大学硕士学位论文,2009
    [71]赵琳瑛: 《基于隐马尔可夫模型的中文命名实体识别研究》,西安电子科技大学硕士学位论文,2008
    [72]照·特木尔:《蒙古语词汇研究》,呼和浩特:内蒙古教育出版社,2004
    [73]哲·斯日吉:《蒙古人名意义》,蒙古语文,1995(6)33-43
    [74]哲·斯日吉:《蒙古人名意义》(续),蒙古语文,1995(7)52-59
    [75]郑家恒:《智能信息处理:汉语语料库加工技术及应用》,北京:科学出版社,2010
    [76]中国社会科学院语言文字应用研究所整理研究室编:《姓氏人名用字分析统计》,北京:语文出版社,1991
    [77]中华人民共和国机械电子工业部:《汉语信息处理词汇部分:基本术语》(G812200.1-90),北京:中国标准出版社.1991
    [78]周雅倩:《最大熵方法及其在自然语言处理中的应用》,复旦大学博士学位论文,2005
    [1]Adam L Berger:Maximum Entropy Approach to Natural Language Processing,1996 Assosiation for Computational Linguistics,2011-9-11, http://www.doc88.com/p-079194692892.html.2012-9-27
    [2]Adwait Ramaparkhi:Maximum Entropy Models For Natural Language Ambiguity Resolution, Ph. D Thesis,University Of Pennsylvania,1998
    [3]Adwait Ratnaparkhi:A Maximum Entropy Part-Of-Speech Tagger, In Proceedings ofthe Empirical Methods in Natural Language Processing Conference, May 17-18,1996
    [4]Andrew borthwick:A Maximum Entropy Approach to Named Entity Recognition, Computer Science Department, New York University.1999
    [5]Armando Suarez Manuel Palomar:Maximum Entropy-based Word Sense Disambiguation system, COLING'02 Proceedings of the 19th international conference on Computational linguistics,2002,1-7
    [6]Brill E:Transform-based Error-Driven Learning and Natural Language Processing:A Case in Part-of-speech Tagging, Computational Linguistics,1995,21(4):543-565
    [7]David Yarowsky:Word sense disambiguation using statistical models of Roget's categories trained on large corpora.Proceedings of the 14th International Conference on Computational Linguistics, Nantes, France,3-28 August,1992,pp.454-460
    [8]David Yarowsky:Unsupervised word sense disambiguation rivaling supervised methods, Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 26-30 June 1995 Cambridge, Massachusetts,1995 pp.189-196
    [9]George A. Miller, Richard Beckwith, Christiane Fellbaum, etc:WordNet:An on-line lexical database, International Journal of Lexicography,3(4),1990,pp.235-244
    [10]Isozaki,H.Kazawa:Efficient Support Vector Classifiers for Named Entity Recognition, In Proceedings of COING,2002,1-7
    [11]Scott Miller, Crystal,Heidi Fox,et al:Algorithms that Learn to Extract Information BBN:Description of the sift system as used for MUC-7, In Proceedings of seventh Message Understanding Conference.1998
    [12]Sekinenyu:Description of the Japanese NE System Used for MET-2, In Proceedings of seventh Message Understanding Conference.1998
    [13]Lesk, Michael:Automated Sense Disambiguation Using Machine-readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. Proceedings of the 1986 SIGDOC Conference, Toronto, Canada, June 1986, pp.24-26
    [14]Peter F.Brown,Stephen A.Della Pietra,Vincent J.Della Pietra:Word-sense disambiguation using statistical mathods.Proceeding of the 29th Meeting of the Association for Computational Linguistics,Berkeley 1991, pp.264-270
    [15]T. Joachims:Text Categorization with Support Vector Machines:Learning with many Relevant Features, In Proceedings of the European Conference on Machine Learning, Berlin, 1998, pp.137-142
    [16]W.K Gale, Church, D.Yarowsky:one sense per discourse, Proceedings of the DARPA Speech and Natural Language Workshop,1992, pp.233-237
    [17]Zhang Le:Maximum Entropy Modeling Toolkit for Python and C++.2004-12-29, http://www.docin.com/p-13535384.html, 2012-9-8

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700