用户名: 密码: 验证码:
中文报业出版的文字质量智能辅助控制技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
从汉字“激光照排”技术的应用开始,中文新闻出版业的信息化水平突飞猛进。近年来,我国中文报业出版规模不断扩大,报社中的采编、组版、印刷、财务和发行等生产环节已实现信息化。但是,报业生产流程中的质量控制环节仍然以传统的全手工方式处理每日见报的新闻稿件及版面,效率低,成本高,成为报业生产的瓶颈所在。
     本文从当前报业出版的现状和存在的问题出发,以报业生产流程优化为切入点,以自动文字查错和重稿检测为手段,以期实现智能辅助的报业出版文字质量控制。论文取得的主要成果如下:
     1.对现有的报业生产流程和相关软件进行整合优化,提出了文字质量数字化智能辅助控制的概念框架和技术框架。优化后的生产流程不仅为人和计算机提供了协同质量控制的数字化平台,而且为计算机构建了闭环学习的环境,使其能从历史稿件中不断学习新词和语言知识,这些知识又应用于基于词汇语义类的文字查错和重稿检测算法,因此计算机可以较高的智能辅助人工质量控制。
     2.为利用词汇语义进行语义层面的文字查错,提出了面向文字查错的汉语实词语义分类体系划分方法及种子词获取方法。并提出一种基于种子词的汉语实词义类自动获取算法,利用句法和构词素两种特征,从大规模未分词语料库中自动获取实词的义类标签,该算法能自动获取多义词的多个义类,并能识别情感词。给出了基于词汇义类的汉语词法分析过程,利用条件随机场模型标注词汇义类并识别名词短语边界。
     3.根据新闻稿的文字错误类型及造成错误的原因,针对中文自动校对研究中没有解决的语法、语义以及前后不一致等错误,提出了四种针对不同错误类型的文字查错算法。基于义类3-gram的语义查错算法是利用词汇义类之间的邻接异常查找普通查错算法无法查出的真词替换错误,以及部分语法、语义错误。基于语义优选的查错算法是利用动词对主语和宾语的语义优选,查找长距离的动宾或主谓搭配错误。基于点互信息的复句结构和标点查错算法,是利用复句连词和标点之间的共现概率查找语法和标点错误。人名-职务不一致检测利用人名-职务对的比较,查找人名或职务在前后文的不一致错误。
     4.针对重稿检测对历史稿件自动更新的需求,提出了重稿检测的流程与具体算法。算法首先对历史稿件按照广义话题进行分类,并在广义话题内对稿件聚类。在线重稿检测时,首先根据待测稿件的首段文字将其分配到相应的事件类下;然后利用全文特征在事件类内判断其是否为重稿。算法可以同时实现历史稿件自动更新和重稿检测,通过段落间的相似比较,提高重稿检测的精度。
     基于生产流程优化的应用系统在《长江日报》上线并运行2年多,其在效率和成本方面的优势得到证明。本文提出的自动文字查错和重稿检测算法绝大多数也已在系统中得到应用。
The informationize level of Chinese Newspaper Publishing has leaped greatly since the application of Chinese Characters’laser photocomposing system. During recent years, the Chinese Newspaper Publishing has scaled up continuously, and the producing processes, such as reportorial writing, typesetting, press, financial and circulational management etc. have digitalized. However, the quality control process, which processes news text and newspaper to control errors and repetitions, is still complete manual. The manual quality control process has been the bottleneck of newspaper publishing because of its low efficiency and high cost.
     In this thesis, based on analyzing the problems of current newspaper publishing process, the current newspaper publishing process was adapted and several automatic error checking and repetition detecting algorithms were proposed, in order to achieve intelligent aided quality control of newpaper publishing. The primary contributions including:
     1. The current producing process and related softwares were integrated and optimized, and the concept and technical framework of intelligence aided quality control of the Chinese Newspaper Publishing was presented. The adapted and optimized producing process provides not only a digital coordinated quality control platform for users and computers, but also a close-loop learning environment for computers, in which environment the computers can learn new words and language knowledges, and then these knowledges were applied in the lexical semantic class based error checking and repetition detection algorithms, thus the computers can aid the quality control with high inteligence.
     2. In order to find semantic errors of texts by using the lexical semantics, a method for substantive lexical semantic classification taxonomy was proposed. And a seed words based semantic class automantic acquisition algorithm for Chinese substantive lexion was proposed. The algorithm can learn semantic class of substantive lexicon from words unsegmented Chinese corpus, and can acquire multi semantic class for multi-sense words, and can acquire subjective words. The semantic class based Chinese lexical analysis process was presented, in this process the conditional random fields model was used to lable the semantic class of segmented Chinese words and identify the boundary of noun phrase.
     3. According to error types and error causations, four algorithms for different error types and error causations were proposed to detect syntactic, semantic and inconsistent errors, which have not been solved in traditional Chinese automatic proofreading. The semantic class based tri-gram error checking algorithm was used to detect the vocabulary replacement errors and some syntactic and semantic errors. The selectional preference based error checking algorithm was used to detect subject-predicate collocation errors and verb-object collocation errors by using the selectional preference. The point mutual information based error checking algorithm was used to detect syntactic and punctuational errors by using the point mutual information between syntactic conjunctions and punctuations. The inconsistent error checking algorithm was used to detect the inconsistent of person name and title in a text.
     4. For the purpose of historical news texts automatic organization in repetition detection, a repetition detection algorithm was proposed. The historical news texts were first classified according to general topics, and then were clustered by events. For the online repetition detection, the input text was first classified to general topic and assigned to event by using the first paragraph text, and then the whole text was used to predict whether the input text was repetition or not. This algorithm can both organize the historical texts automatically and detect repetitions, and the precision of repetition detection was improved by similarity computing between paragraphs of different texts.
     The application system based on adapted and optimized producing pocess has been put into application in Changjing Newspaper for more than 2 years; the advantages on efficiency and cost have been proven. And most of the error checking and repetition detection algorithms have been applied in the system.
引文
[1]新闻出版总署. 2008年全国新闻出版业基本情况[Z],http://www.gapp.gov.cn/cms/html/21/464/200907/465083.html, 2009.
    [2]慧聪邓白氏研究. 2008中国报刊广告市场回顾与展望[Z],http://info.research.hc360.com/2009/04/28113474511.shtml, 2009.
    [3]新华社. 2008省级党报质检工作结束文字编校质量堪忧[Z],http://news.xinhuanet.com/newmedia/2009-07/24/content_11763833.htm, 2009.
    [4] Pew, Research, Center. Newspapers still generate most news, despite Internet[Z],http://www.google.com/hostednews/afp/article/ALeqM5jTHyIGWXyKNT1opu4jGe76Fnc-JQ, 2010.
    [5] Miller G A, Beckwith R, Fellbaum C D, Gross D, Miller K. WordNet: Anonline lexical database [J]. International Journal of Lexicograph, 1990, 3(4): 235-244.
    [6] Agirre E, Martinez D. Integrating selectional preferences in WordNet [A].Proceedings of the First International WordNet Conference [C]. Mysore, India: 2002,
    [7]詹卫东.面向自然语言处理的大规模语义知识库研究述要[J]. 2003.
    [8] Schuler K K. VerbNet: A broad-coverage, comprehensive verb lexicon [D],University of Pennsylvania, January, 2005.
    [9] Levin B. Lexical Semantics and Syntactic Structure [M], 1996.
    [10] Levin B, Rappaport Hovav M. Building verb meanings [M], CSLIPublications, 1998.
    [11] Levin B, Rappaport Hovav M. Argument realization [M], Cambridge, UK:Cambridge University Press, 2005.
    [12] Vossen P, Bloksma L, Boersma P. EuroWordNet Tools and ResourcesReport [R], Amsterdam: University of Amsterdam, 1998.
    [13] Beno.t S, Darja F. Building a free French wordnet from multilingualresources [A]. In Proceedings of Ontolex 2008 [C]. Marrakech, Maroc: 2008,
    [14] Bentivogli L, Pianta E. Exploiting parallel texts in the creation ofmultilingual semantically annotated resources: the MultiSemCor Corpus [J]. NaturalLanguage Engineering, Special Issue on Parallel Texts, 2005, 11(03): 247-261.
    [15] TUFIS D, CRISTEA D, STAMOU S. BalkaNet: Aims, Methods, Resultsand Perspectives. A General Overview [J]. ROMANIAN JOURNAL OFINFORMATION SCIENCE AND TECHNOLOGY, 2004, 7(1-2): 9-43.
    [16] Brown P F, Pietra V J D, deSouza P V, Lai J C, Mercer R L. Class-Basedn-gram Models of Natural Language [J]. Computational Linguistics, 1992, 18(4): 14.
    [17] Gao J, Goodman J T, Miao J. The Use of Clustering Techniques forLanguage Modeling– Application to Asian Language [J]. Computational Linguisticsand Chinese Language Processing, 2001, 6(1): 34.
    [18] Lin D. Automatic retrieval and clustering of similar words [A]. InProceedings of COLING/ACL-1998 [C]. Montreal, Canada.: 1998, 768-774.
    [19] Lin D, Pantel P. Induction of Semantic Classes from Natural LanguageText [A]. KDD 2001 [C]. San Francisco, CA, USA: 2001,
    [20] Pantel P, Lin D. Discovering Word Senses from Text [A]. In Proceedingsof SIGKDD-02 [C]. Edmonton, Canada: 2002, 613-619.
    [21] Pantel P, Ravichandran D. Automatically Labeling Semantic Classes [A].HLT-NAACL [C]. 2004,
    [22] Uszkoreit J, Brants T. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation [A]. Proceedings of ACL-08: HLT [C]. Columbus, Ohio, USA: Association for Computational Linguistics, 2008, 755–762.
    [23] Hearst M. Automatic acquisition of hyponyms from large text corpora [A]. In Proceedings of COLING-92 [C]. Nantes, France: 1992, 539-545.
    [24] Huang Z, Zeng G, Xu W, Celikyilmaz A. Accurate Semantic Class Classifier for Coreference Resolution [A]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing [C]. Singapore: ACL and AFNLP, 2009, 9.
    [25] Zhang H, Zhu M, Shi S, Wen J-R. Employing Topic Models for Pattern-based Semantic Class Discovery [A]. [C]. 2009,
    [26] Kohomban U S, Lee W S. Learning Semantic Classes for Word Sense Disambiguation [A]. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics [C]. Ann Arbor, MI: 2005, 8.
    [27] Yang H, Callan J. A Metric-based Framework for Automatic Taxonomy Induction [A]. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP [C]. Suntec, Singapore: 2009, 9.
    [28] Kozareva Z, Riloff E, Hovy E. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs [A]. Proceedings of ACL-08: HLT [C]. Columbus, Ohio, USA: Association for Computational Linguistics, 2008, 9.
    [29] Nulty P. Semantic Classification of Noun Phrases Using Web Counts and Learning Algorithms [A]. Proceedings of the ACL-07 Student Research Workshop [C]. Prague, Czech Republic: 2007, 6.
    [30] Seaghdha D O, Copestake A. Semantic Classification with Distributional Kernels [A]. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008) [C]. Manchester, UK: 2008, 649–656.
    [31] Snow R, Jurafsky D, Ng A Y. Semantic Taxonomy Induction from Heterogenous Evidence [A]. Proceedings of the 44th Annual Conference of the Association for Computational Linguistics [C]. 2006, 8.
    [32] Walde S S i. Clustering verbs semantically according to their alternation behaviour [A]. Proceedings of the 18th International Conference on Computational Linguistics (COLING) [C]. Saarbrucken, Germany: 2000, 7.
    [33] Korhonen A. Subcategorization acquisition [D], Cambridge, United Kingdom: University of Cambridge, 2002.
    [34] Korhonen A, Briscoe T. Extended Lexical-Semantic Classification of English Verbs [A]. HLT-NAACL Workshop on Computational Lexical Semantics [C]. Boston, MA: 2004, 9.
    [35] Walde S S i, Hying C, Scheible C, Schmid H. Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences [A]. Proceedings of ACL-08: HLT [C]. Columbus, Ohio, USA: 2008, 9.
    [36] Walde S S i. Experiments on the Automatic Induction of German Semantic Verb Classes [D], Stuttgart, Germany: Universit?t Stuttgart, 2003.
    [37] Walde S S i. Experiments on the Automatic Induction of German Semantic Verb Classes [J]. Computational Linguistics, 2006, 32(2): 36.
    [38] WALDE S S I. Human Associations and the Choice of Features for Semantic Verb Classification [J]. Research on Language and Computation, Springer,2008, 6(1): 36.
    [39] Culo O, Erk K, Pad S. Comparing and Combining Semantic Verb Classifications [J]. Language Resources and Evaluation, 2008, 42(3): 34.
    [40] Joanis E, Stevenson S, James D. A general feature space for automatic verb classification [J]. Natural Language Engineering, 2007, 14(3): 31.
    [41] Sun L, Korhonen A. Improving Verb Clustering with Automatically Acquired Selectional Preferences [A]. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) [C]. Singapore: 2009, 10.
    [42] Battle A, Nguyen H. Verb Selectional Preferences and Clustering [R], 2002.
    [43] Li J, Brew C. Which Are the Best Features for Automatic Verb Classification [A]. Proceedings of ACL-08: HLT [C]. Columbus, Ohio, USA: 2008, 434–442.
    [44] Yallop J, Korhonen A, Briscoe T. Automatic Acquisition of Adjectival Subcategorization from Corpora [A]. Proceedings of the 43rd Annual Meeting of the ACL [C]. Ann Arbor: 2005, 8.
    [45] Boleda G, Badia T, imWalde S S. Morphology vs. Syntax in Adjective Class Acquisition [A]. Proceedings of the 43th Annual Conference of the Association for Computational Linguistics [C]. 2005, 10.
    [46] Boleda G, Badia T, Batlle E. Acquisition of Semantic Classes for Adjectives from Distributional Evidence [A]. ccc [C]. 2004,
    [47] Torrent G B. Automatic acquisition of semantic classes for adjectives [D], Barcelona, Spain: Pompeu Fabra University, 2006.
    [48]陳群秀.信息處理用現代漢語虛詞義类詞典研究和工作單設計[J]. Computational Linguistics and Chinese Language Processing, 2005, 10(4): 14.
    [49]苏新春.《现代汉语语义分类词典》(TMC)研制中若干问题的思考[J].中文信息学报, 2008, 22(5): 12.
    [50]王惠,詹卫东,俞士汶.现代汉语语义词典规格说明书[J]. Journal of Chinese Language and Computing, 2003, 13(2): 159-176.
    [51]詹卫东.一个汉语语义知识表达框架:广义配价模式[A].第5届全国计算语言学联合学术会议(JSCL2001) [C]. 2001,
    [52]周强,冯松岩.构建知网关系的网状表示[J].中文信息学报, 2000, 2000(06).
    [53]哈尔滨工业大学信息检索教研室. http://ir.hit.edu.cn/ [Z],
    [54]張如瑩,黃居仁.中央研究院中英雙語知識本體詞網 (Sinica BOW):結合詞網,知識本體,與領域標記的詞彙知識庫[A]. Rocling 2004 [C]. 2004,
    [55]闻扬,苑春法,黄昌宁.基于搭配对的汉语形容词-名词聚类[J].中文信息学报, 2001, 14(6): 45-50.
    [56] Chen K-J, You J-M. A Study on Word Similarity using Context Vector Models [J]. Computational Linguistics and Chinese Language Processing, 2002, 7(2): 22.
    [57]陈浪舟,黄泰翼.一种新颖的词聚类算法和可变长统计语言模型[J].计算机学报, 1999, 1999(09).
    [58]孙静,朱杰,徐向华.一种新的中文词自动聚类算法[J].上海交通大学学报, 2003, 2003(S2).
    [59]袁里驰.基于相似度的词聚类算法和可变长语言模型[J].小型微型计算机系统, 2009, 2009(05).
    [60]孙广路,王晓龙,刘秉权,关毅.基于词聚类特征的统计中文组块分析模型[J].电子学报, 2008, 36(2): 2450-2454.
    [61]王锦,陈群秀.汉语述语形容词机器词典机器学习词聚类研究[J].中文信息学报, 2007, 21(3): 9.
    [62]韩习武,赵铁军.基于次范畴化的汉语多义动词模糊聚类[J].软件学报, 2006, 17(2): 8.
    [63]冀铁亮,孙薇薇,穗志方.语言学与统计方法结合建立汉语动词SCF类型集[J].中文信息学报, 2007, 21(5): 10.
    [64]夏迎炬,于浩,西野文人.《人民日報》語料庫命名實体分类的研究[J]. Computational Linguistics and Chinese Language Processing, 2005, 10(4): 10.
    [65]楊昌樺,陳信希.以語法分析為輔建立新聞名詞知識庫[A]. Rocling 2004 [C]. 2004,
    [66] Lee C-S, Kuo Y-H, Liao C-H, Jian Z-W. A Chinese Term Clustering Mechanism for Generating Semantic Concepts of a News Ontology [J]. Computational Linguistics and Chinese Language Processing, 2005, 11(2): 26.
    [67] Hatzivassiloglou V, McKeown K R. Predicting the semantic orientation of adjectives [A]. In Proceedings of the ACL [C]. 1997, 174-181.
    [68] Wiebe J M. Learning subjective adjectives from corpora [A]. In Proceedings of the 17th National Conference on Artificial Intelligence [C]. 2000, 735-740.
    [69] Turney P D, Littman M L. Measuring Praise and Criticism: Inference of Semantic Orientation from Association [J]. ACM Transactions on Information System (TOIS), 2003, 21(4): 315-346.
    [70] Yuen R W M, Chan T Y W, Lai T B Y, Kwong O Y, T'sou B K Y. Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words [A]. In Proceedings of COLING-2004 [C]. Geneva, Switzerland: 2004, 1008-1014.
    [71] Maarten J K, Marx M, Mokken R J, Rijke M D. Using WordNet to measure semantic orientation of adjectives [A]. In Proceedings of 4th International Conference on Language Resources and Evaluation [C]. Lisbon, Portugal: 2004, 1115-1118.
    [72] Hu M, Liu B. Mining and summarizing customer reviews [A]. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining [C]. Seattle, Washington, USA: 2004, 168-177.
    [73] Kim S-M, Hovy E. Identifying and Analyzing Judgment Opinions [A]. In Proceedings of HLT-NAACL 2006 [C]. New York, US: 2006, 200-207
    [74] Esuli A, Sebastiani F. Determining the Semantic Orientation of Terms through Gloss Classification [A]. In Proceedings of CIKM [C]. Bremen, Germany: 2005, 617-624.
    [75] Andreevskaia A, Bergler S. Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses [A]. In Proceedings of EACL [C]. Trento, Italy: 2006, 209–216.
    [76]朱嫣岚,闵锦.基于HowNet的词汇语义倾向计算[J].《中文信息学报》, 2006, 20(1): 14-20.
    [77] Ku L-W, Chen H-H. Mining Opinions from the Web: Beyond Relevance Retrieval [J]. Journal of American Society for Information Science and Technology,Special Issue on Mining Web Resources for Enhancing Information Retrieval, 2007, 58(12): 1838-1850.
    [78] SEBASTIANI F. Machine Learning in Automated Text Categorization [J]. ACM Computing Surveys, 2002, 34(1): 1-47.
    [79] Abney S. Semisupervised Learning for Computational Linguistics [M], London: Chapman & Hall/CRC, 2008.
    [80]周强,孙茂松,黄昌宁.汉语最长名词短语的自动识别[J].软件学报, 2000, 11(2): 7.
    [81]林晏僖,高照明,高成炎.中文名詞組的辨識:監督式與半監督式學習法的實驗[J].
    [82] Church K W. A Stochastic parts program and nouns phrase parser for unrestricted text [A]. Proceedings of 2nd Conf on Applied natural Language Processing [C]. Austin Texas: 1988, 136-143.
    [83] Brill E. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging [J]. Computational Linguistics, 1995, 21(4): 24.
    [84]周雅倩,郭以昆,黄萱菁.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展, 2003, 40(3): 7.
    [85]李素建,刘群,杨志峰.基于最大熵模型的组块分析[J].软件学报, 2003, 26(12): 9.
    [86] Sun G-L, Huang C-N, Wang X-L, Xu Z-M. Chinese Chunking Based on Maximum Entropy Markov Models [J]. Computational Linguistics and Chinese Language Processing, 2006, 11(2): 22.
    [87] Chen W, Zhang Y, Isahara H. An Empirical Study of Chinese Chunking [A]. Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions [C]. Sydney: 2006, 8.
    [88] Wu Y-C, Chang C-H, Lee Y-S. A General and Multi-lingual Phrase Chunking Model Based on Masking Method [A]. CICLing 2006 [C]. 2006, 12.
    [89] Gildea D, Jurafsky D. Automatic labeling of semantic roles [J]. Computational Linguistics, 2002, 28(3): 245-288.
    [90] Palmer M, Kingsbury P, Gildea D. The Proposition bank: An annotated corpus of semantic roles [J]. Computational Linguistics, 2005, 31(1): 71-106.
    [91] Xue N. Annotation Guidelines for the Chinese Proposition Bank [R], University of Pennsylvania, 2007.
    [92] Xue N, Palmer M. Calibrating Features for Semantic Role Labeling [A]. In Proceedings of EMNLP 2004 [C]. 2004,
    [93] Liu T, Che W, Li S. Semantic Role Labeling System Using Maximum Entropy Classifier [A]. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL-2005) [C]. Ann Arbor, Michigan, US: 2005, 189–192.
    [94] Carreras X, Marquez L. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling [A]. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL-2005) [C]. Ann Arbor, Michigan: Ann Arbor, Michigan, 2005, 152-164.
    [95] Kwong O Y, Tsou B K. Semantic Role Tagging for Chinese at the Lexical Level [A]. In Proceedings of IJCNLP 2005 [C]. 2005,
    [96] Punyakanok V, Roth D, Wen-tauYih. The importance of syntactic parsingand inference in semantic role labeling [J]. Computational Linguistics, 2008, 34(2): 31.
    [97] Marquez L, Comas P, Gimenez J, Catala N. Semantic Role Labeling as Sequential Tagging [A]. Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL) [C]. Ann Arbor: 2005, 193–196.
    [98] Toutanova K, Haghighi A, Manning C. A global joint model for semantic role labeling [J]. Computational Linguistics, 2008: 31.
    [99] McCallum A, Freitag D, Pereira F. Maximum Entropy Markov Models for Information Extraction and Segmentation [A]. Proceedings of 17th International Conference on Machine Learning [C]. San Francisco, CA: 2000, 591–598.
    [100] Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [A]. International Conference on Machine Learning [C]. 2001,
    [101] Blunsom P. Maximum Entropy Markov Models for Semantic Role Labelling [R], Australasian Language Technology Workshop, 2004.
    [102] Cohn T, Blunsom P. Semantic Role Labelling with Tree Conditional Random Fields [A]. Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL-2005) [C]. Ann Arbor, Michigan, US: 2005, 169–172.
    [103] Xue N, Palmer M. Automatic Semantic Role Labeling for Chinese Verbs [A]. Proceedings of the 19th International Joint Conference on Artificial Intelligence(IJCAI-2005) [C]. Edinburgh, Scotland: 2005, 1160-1165.
    [104] Xue N. Labeling Chinese predicates with semantic roles [J]. Computational Linguistics, 2008, 34(2): 30.
    [105] Sun H, Jurafsky D. Shallow Semantic Parsing of Chinese [A]. In Proceedings of NAACL-HLT 2004 [C]. Boston, USA: 2004, 1032-1039.
    [106]刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报, 2007, 18(3): 565-573.
    [107]董静,孙乐,吕元华.基于线性链条件随机场模型的语义角色标注[A].中文信息学会二十五周年学术会议[C]. 2006,
    [108] Furstenau H, Lapata M. Semi-Supervised Semantic Role Labeling [A]. Proceedings of the 12th Conference of the European Chapter of the ACL [C]. Athens, Greece: 2009, 220–228.
    [109] Furstenau H, Lapata M. Graph Alignment for Semi-Supervised Semantic Role Labeling [A]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing [C]. Singapore: 2009, 11–20.
    [110] Deschacht K, Moens M-F. Semi-supervised Semantic Role Labeling Using the Latent Words Language Model [A]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing [C]. Singapore: 2009, 21–29.
    [111] Swier R S, Stevenson S. Unsupervised Semantic Role Labelling [A]. Proceedings of the Conference on Empirical Methods in Natural Language Learning [C]. 2004,
    [112] Grenager T, Manning C D. Unsupervised Discovery of a Statistical Verb Lexicon [A]. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing [C]. Sydney, Australia: Association for Computational Linguistics, 2006, 8.
    [113] Abend O, Reichart R, Rappoport A. Unsupervised argument identification for semantic role labeling [A]. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP [C]. Suntec, Singapore: Association for ComputationalLinguistics, 2009,
    [114] Kim S-M, Hovy E. Determining the Sentiment of Opinions [A]. Proceeding of the Conference on Computational Linguistics [C]. Geneva, Switzerland: 2004, 1367-1373.
    [115] Turney P D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews [A]. In Proceeding of 40th Annual meeting of ACL [C]. Philadelphia, USA: 2002, 417-424.
    [116] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification using Machine Learning Techniques [A]. In Proceedings of EMNLP [C]. Philadelphia, USA: 2002, 79-86.
    [117] Dave K, Lawrence S, Pennock D M. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews [A]. In Proceedings of the 12th international conference on World Wide Web (2003) [C]. Budapest, Hungary: 2003, 519-528.
    [118] Yu H, Hatzivassiloglou V. Towards answering opinion question: Separating facts from opinions and identifying the polarity of opinion sentences [A]. In Proceedings of 8th Conference on Empirical Methods in Natural Language Processing (2003) [C]. 2003, 129-136.
    [119] Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, Boom-boxes and Blenders: Domain adaptation for sentiment classification [A]. n Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics [C]. 2007, 440-447.
    [120] Li S, Zong C. Multi-domain Sentiment Classification [A]. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies [C]. Columbus, Ohio, USA: 2008, 257-260.
    [121] Wiebe J, Wilson T, Cardie C. Annotating expressions of opinions and emotions in language [J]. Language Resources and Evaluation, 2005, 39(2-3): 165-210.
    [122] Liu B. Opinion Observer: Analyzing and Comparing Opinions on the Web [A]. In Proceedings of the 14th international conference of World Wide Web [C]. Chiba, Japan: 2005,
    [123] Yi J, Niblack W. Sentiment Mining in WebFountain [A]. In Proceedings the 21st International Conference on Data Engineering [C]. Tokyo, Japan: 2005, 1073-1083.
    [124]姚天昉,聂青阳,李建超.一个用于汉语汽车评论的意见挖掘系统[A].中国中文信息学会二十五周年学术会议论文集[C].清华大学出版社, 2006, 260-281.
    [125] Allan J. Topic Detection and Tracking: Event-Based Information Organization. [M], Kluwer Academic Publishers, 2002.
    [126] Yang Y, Pierce T, Carbonell J. A study on Retrospective and On-Line Event detection [A]. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [C]. CMU, USA: 1998, 28-36.
    [127]鲍军鹏,沈钧毅,刘晓东,宋擒豹.自然语言文档复制检测研究综述[J].软件学报, 2003, 14(10): 1753-1760.
    [128] Allan J, Papka R, Lavrenko V. On-Line New Event Detection and Tracking [A]. Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. New York: ACM Press, 1998, 37-45.
    [129] Lam W, Meng H M L, Wong K L, Yen J C H. Using Contextual Analysisfor New Event Detection [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2001, 16: 525-546.
    [130] Brants T, Chen F, Farahat. A. A system for new event detection [A]. Proceedings of the 26th SIGIR conference on Research and development in information retrieval [C]. 2003.,
    [131] Stokes N, Carthy J. Combining semantic and syntactic document classifiers to improve first story detection [A]. Proceedings of the 24th Annual Internal ACM SIGIR Conference [C]. New York: ACM Press, 2001, 424-425.
    [132] Yang Y Z J, Carbonell J, Jin C. . Topic-Conditioned novelty detection [A]. Proceedings of the 8th ACM SIGKDD International conference [C]. ACM Press, 2002, 688-693.
    [133] Kumaran G, Allan J. Text classification and named entities for new event detection [A]. Proceedings of the 27th ACM SIGIR Conference [C]. New York: ACM Press, 2004, 297-304.
    [134] Makkonen J, Ahonen-Myka H, Salmenkivi M. Applying Semantic Classes in Event Detection and Tracking [A]. Proceedings of the International Conference on Natural Language Processing [C]. 2002, 175-183.
    [135]张阔,李涓子,吴刚,王克宏.基于词元再评估的新事件检测模型[J].软件学报, 2008, 19(04): 817-828.
    [136] Manber U. Finding Similar Files in a Large File System [A]. In Proceedings of the USENIX Winter 1994 Technical Conference [C]. 1994, 1-10.
    [137] Brin S, Davis J, Garcia-molina H. Copy Detection Mechanisms for Digital Documents [A]. In Proceedings of the ACM International Conference on Management of Data (SIGMOD 1995) [C]. San Jose, California: 1995,
    [138] Heintze N. Scalable Document Fingerprinting [A]. In Proceedings of USENIX Workshop on Electronic Commerce [C]. 1996,
    [139] Broder A Z, Glassman S C, Manasse M S, Zweig G. Syntactic clustering of the Web [A]. In Selected papers from the sixth international conference on World Wide Web (1997) [C]. 1997, 1157-1166.
    [140] Wise M J. YAP3: improved detection of similarities in computer program and other texts [J]. Source ACM SIGCSE Bulletin, 1996, 28(1): 130-134.
    [141] Monostori K, Zaslavsky A B, Schmidt H W. MatchDetectReveal: Finding overlapping and similar digital documents [A]. In Proceedings of the Information Resources Management Association International Conference (IRMA2000) [C]. Anchorage, Alaska, USA: 2000, 955-957.
    [142] Monostori K, Zaslavsky A, Schmidt H. Suffix Vector: space- and time-efficient alternative to suffix trees [J]. Australian Computer Science Communications 2002, 24(1): 157-165.
    [143] Shivakumar N, García-Molina H. SCAM: A Copy Detection Mechanism for Digital Documents [A]. In 2nd International Conference in Theory and Practice of Digital Libraries (DL 1995) [C]. Austin, Texas: 1995, 398-409
    [144] Si A, Leong H V, Lau R W H. CHECK: A document plagiarism detection system [A]. In Proceedings of the ACM Symposium for Applied Computing [C]. 1997, 70-77.
    [145]宋擒豹,沈钧毅.数字商品非法复制和扩散的监测机制[J]. <<计算机研究与发展>>, 2001, 38(01): 121-125.
    [146] Uzuner ?, Davis R. Content and Expression-Based Copy Recognition for Intellectual Property Protection [A]. In Proceedings of the 3rd ACM Workshop onDigital Rights Management (DRM'03) [C]. Washington, DC, USA: 2003,
    [147] Uzuner O, Davis R, Katz B. Using Empirical Methods for Evaluating Expression and Content Similarity [A]. Proceedings of the 37th International Conference on System Sciences [C]. Hawaii, USA: 2004,
    [148] Nakov P I. Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics [D], Berkeley, Carlifornia, US: University of California at Berkeley, 2007.
    [149]朱学锋,俞士汶,王惠.现代汉语5万词语归类的实践[J].语言文字应用, 1997, (4): 88-94.
    [150] Davlin J T, Gonnerman L M, Andersen E S. Category-specific semantic deficits in focal and widespread brain damage: A computational account [J]. Journal of Cognitive Neuroscience, 1998, 1998(10): 77-94.
    [151] Kuo J Y-c, Sera M D. Classifier effects on human categorization: the role of shape classifiers in Mandarin Chinese [J]. Journal of East Asian Linguist 2009, 18(1): 19.
    [152]吴云芳.并列成分中心语语义相似性考察[J].当代语言学, 2005, 7(4): 12.
    [153] PACKARD J L. The Morphology of Chinese A Linguistic and Cognitive Approach [M], The Pitt Building, Trumpington Street, Cambridge, United Kingdom: cambridge university press, 2000.
    [154] Chen K-J, Chen C-j. Automatic Semantic Classification for Chinese Unknown Compound Nouns [A]. ACL 2000 [C]. 2000, 7.
    [155] Tseng H. Semantic classification of Chinese unknown words [A]. [C]. 2003,
    [156]曾慧馨,劉昭麟,高照明,陳克健.以構詞律與相似法為本的中文動詞自動分类研究[J]. Computational Linguistics and Chinese Language Processing, 2002, 7(1): 28.
    [157] Ku L-W, Huang T-H, Chen H-H. Using Morphological and Syntactic Structures for Chinese Opinion Analysis [A]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing [C]. Singapore: 2009,
    [158] Manning C D, Raghavan P, Schütze H. An Introduction to Information Retrieval [M], Cambridge, England: Cambridge University Press, 2009.
    [159]王瑞琴,孔繁胜.无监督词义消歧研究[J].软件学报, 2009, 20(8): 2138-2152.
    [160] NAVIGLI R. Word Sense Disambiguation: A Survey [J]. ACM Computing Surveys, 2009, 41(2): 69.
    [161] Yarowsky D. Decision lists for lexical ambiguity resolution:Application to accent restoration in Spanish and French [A]. In Proceedings of the 32nd Annual Meeting of Association for Computational Linguistics [C]. Las Cruces: 1994, 88-95.
    [162]李涓子,黄昌宁.基于转换的无指导词义标注方法[J].清华大学学报(自然科学版), 1999, 39(7): 117-121.
    [163] Lesk M. Automatic Sense Disambiguation : How to tell a Pine Cone from and Ice Cream Cone [A]. In Proceeding of the 1986 SIGDOC Conference [C]. New York: 1986, 24-36.
    [164] Manning C D, Schutze H. Foundations of Statistical Natural Language Processing [M], Cambridge, Massachusetts: The MIT Press, 1999.
    [165] Xue N, Shen L. Chinese word segmentation as LMR tagging [A]. InProceedings of the 2nd SIGHAN Workshop on Chinese Language Processing [C]. Singpore: 2003,
    [166]宋彦,蔡东风,张桂平,赵海.一种基于字词联合解码的中文分词方法[J].软件学报, 2009, 20(9): 2366-2375.
    [167] Li Z, Sun M. Punctuation as Implicit Annotations for Chinese Word Segmentation [J]. Computational Linguistics, 2009, 35(4): 8.
    [168]冯元勇,孙乐,张大鲲,李文波.基于小规模尾字特征的中文命名实体识别研究[J].电子学报, 2008, 36(9): 1833-1838.
    [169] Seaghdha D O, Copestake A. Co-occurrence Contexts for Noun Compound Interpretation [A]. Proceedings of the ACL-07 Workshop on A Broader Perspective on Multiword Expressions [C]. Prague, Czech Republic: 2007,
    [170] Kim S N, Baldwin T. An Unsupervised Approach to Interpreting Noun Compounds [A]. NLPPK08 [C]. 2008, 7.
    [171] Utsumi A. Computational Semantics of Noun Compounds in a Semantic Space Model [A]. IJCAI 2009 [C]. 2009,
    [172] Bird s, klain E, loper e. Natural Language Processing with Python [M], Sebastopol, CA: O'Reilly Media, Inc., 2009.
    [173]侯珺,王作英.一种词义与词的混合语言模型及其应用[J].中文信息学报, 2002, 15(6): 7-12.
    [174]马金山,张宇,刘挺,李生.利用三元模型及依存分析查找中文文本错误[J].情报学报, 2004, 23(6): 723-728.
    [175]吴云芳,段慧明,俞士汶.动词对宾语的语义选择限制[J].语言文字应用, 2005, (2): 8.
    [176] Alishahi A, Stevenson S. A Cognitive Model for the Representation and Acquisition of Verb Selectional Preferences [A]. Proceedings of the ACL Workshop on Cognitive Aspects of Computational Language Acquisition, [C]. 2007,
    [177] Agirre E, Martinez D. Learning class-to-class selectional preferences [A]. ACL-2001 Workshop on Computational Natural Language Learning (ConLL) [C]. Toulouse, France: 2001, 8.
    [178] Resnik P. Selectional preference and sense disambiguation [A]. Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? [C]. 1997,
    [179] Bergsma S, Lin D, Goebel R. Discriminative Learning of Selectional Preference from Unlabeled Text [A]. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing [C]. Honolulu: 2008,
    [180] Brockmann C, Lapata M. Evaluating and combining approaches to selectional preference acquisition [A]. EACL [C]. 2003, 27-34.
    [181] Pantel P, Bhagat R, Coppola B. ISP: Learning Inferential Selectional Preferences [A]. HLT07 [C]. 2007, 8.
    [182] Erk K. A simple, simiarity-based model for selectional preferences [A]. Proceedings of the 45th Conference of the Association for Computational Linguistics [C]. Prague, Czech Republic: 2007, 8.
    [183] Zanzotto F M, Pennacchiotti M, Pazienza M T. Discovering asymmetric entailment relations between verbs using selectional preferences [A]. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL [C]. Sydney: 2006,
    [184] McCarthy D, Carroll J. Disambiguating Nouns, Verbs, and AdjectivesUsing Automatically Acquired Selectional Preferences [J]. Computational Linguistics, 2003, 29(4): 639-654.
    [185] Young A C. The Effect of Selectional Preferences on Semantic Role Labeling [D], AUSTIN, TX, US: THE UNIVERSITY OF TEXAS AT AUSTIN, 2009.
    [186] Zapirain B n, Agirre E, M`arquez L ?. Generalizing over lexical features: Selectional preferences for semantic role classification. [A]. Proceedings of the ACLIJCNLP 2009 Conference Short Papers [C]. Suntec, Singapore: Association for Computational Linguistics, 2009, 4.
    [187] Sun W, Sui Z, Wang M, Wang X. Chinese Semantic Role Labeling with Shallow Parsing [A]. Proceedings of the Conference on Empirical Methods in Natural Language Processing 2009 [C]. Singapore: 2009, 1475–1483.
    [188] Vickrey D, Koller D. Sentence Simplification for Semantic Role Labeling [A]. Proceedings of ACL-08: HLT [C]. Columbus, Ohio, USA: 2008, 344–352.
    [189] Ding B-G, Huang C-N, Huang D-G. Chinese Main Verb Identification: From Specification to Realization [J]. Computational Linguistics and Chinese Language Processing, 2005, 10(1): 42.
    [190]穗志方,俞士汶.面向EBMT的汉语单句谓语中心词识别研究[J]. JOURNAL OF CHINESE INFORMATION PROCESSING, 1998, 12(4): 39-46.
    [191]龚小谨,罗振声,骆卫华.汉语句子谓语中心词的自动识别[J]. JOURNAL OF CHINESE INFORMATION PROCESSING, 2003, 17(2): 7-13.
    [192] Berger A L, Pietra S A D, Pietra V J D. A Maximum Entropy Approach to Natural Language Processing [J]. Computational Linguistics, 1996, 22(1): 39–71.
    [193] Huang X, Acero A, Hon H-w. Spoken Language Processing: A Guide to Theory, Algrithm and System Development [M], Upper Saddle River, New Jersey: Prentice Hall PTR, 2001.
    [194] Chen A, Peng F, Shan R. Chinese Named Entity Recognition with Conditional Probabilistic Models [A]. The Third International Chinese Language Processing Bakeoff [C]. Sydney, Australia: 2006, 173-176.
    [195] Zhou J, He L, Dai X. Chinese Named Entity Recognition with a Multi-Phase Model [A]. The Third International Chinese Language Processing Bakeoff [C]. Sydney, Australia: 2006, 213-216.
    [196] Feldman R, Sanger J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data [M], New York: Cambridge University Press, 2006.
    [197] Schapire R E, Singer Y. Boostexter: A boosting-based system for text categorization [J]. Machine Learning, 2000, 39(2-3): 35-168.
    [198] Papka R. On-line New Event Detection, Clustering and Tracking. PhD thesis [D], Department of Computer Science, University of Massachusetts, 1999.
    [199]徐新文.基于内容的新闻视频挖掘方法研究[D],长沙,湖南:国防科技大学, 2009.
    [200] Bethard S, Yu H, Thornton A, Hatzivassiloglou V, Jurafsky D. Automatic Extraction of Opinion Propositions and their Holders [A]. In Proceedings of the AAAI Spring Symposium [C]. Stanford, USA: 2004,
    [201]洪宇,张宇,范基礼,刘挺,李生.基于子话题分治匹配的新事件检测[J].计算机学报, 2008, 31(4): 687-695.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700