基于依存句法分析的语义角色标注

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于依存句法分析的语义角色标注

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Dependency Parsing Based Semantic Role Labeling
作者：胡禹轩
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：语义角色标注 ; 依存句法分析 ; 重排序
英文关键词：semantic role labeling ; dependency parsing ; re-ranking
学位年度：2009
导师：刘挺
学科代码：081203
学位授予单位：哈尔滨工业大学
论文提交日期：2009-06-01

摘要

随着计算机处理能力的提高以及统计机器学习等理论的发展,浅层语义分析逐渐被研究人员所重视。语义角色标注是浅层语义分析的一种实现方式,其具有问题定义清晰,便于人工标注和评测等优点,同时又具有非常广泛的应用前景。
     对语言的深层处理过程中,句法分析处于一个十分重要的位置,也是浅层语义分析最直接的基础。在句法体系中,依存句法以其形式简洁、易于标注、便于应用等优点,逐渐受到研究人员的重视。在句子分词结果的基础上,依存句法分析不引入新的短语节点,句法结构信息附加在词和词之间的关系上,句法分析结果得到相对的简化;其分析结果趋向扁平化,句法树层次较浅,这使得句法树上的节点之间距离相对缩短,简化系统的同时也更利于研究节点之间的关系;句子中原本线性距离很远的节点有可能存在很近的甚至是直接的依存关系,这有利于在意义层面对句子结构的理解。
     本文实现了一个基于依存句法分析的语义角色标注系统,它将语义角色标注任务分为谓词识别、谓词分类、语义角色识别和分类、标注结果生成等四个部分。这个系统参加了CoNLL2008国际评测,其F-Score达到78.52,最终取得了第二名的好成绩。
     传统的语义角色标注结果生成阶段只利用或主要利用了角色本身和角色与谓词之间的上下文信息,而没有挖掘同一谓词的多个不同角色之间的相互作用,即谓词框架的全局信息。本文在参加CoNLL2008评测的系统的基础上,利用柱状搜索算法生成若干较好的候选标注结果,再使用Online Passive-Aggressive算法训练一个用对数线性模型对候选结果进行重排序。最终又取得了0.2%的性能提高。
With the improvement of computing power of modern computer systems and development of theories like machine learning, more and more attention has been paid to the field of Shallow Semantic Parsing, within which Semantic Role Labeling was one of the implementations. Semantic Role labeling has the advantages of a clear definition, convenient manual labeling and evaluation and a broad field of application.
     Parsing was always staying in the heart of deep processing of language, and the most direct basis of Semantic role labeling. Dependency Parsing, among the parsing frameworks, has a simple formulation, the easiness of gold corpus labeling, and a easy application, so that has been paid great attention to. As for application in Semantic Role Labeling, easiness would be brought in by dependency parsers due to their simplicity of not introducing extra“phrase”node, a flattened overall structure and a relative short distance between nodes which leads to convenient observation to relations between long distance words, all of which helps the understanding of the semantic structure of the original sentence.
     A Semantic Role Labeling system based on Dependency Parsing is implemented, dividing the SRL task into four relative separate parts: the recognition of predicates, the word sense disambiguation of predicates, the recognition and classification of roles, and the labeling formulation. The system was one of the competing systems in CoNLL2008 shared task evaluation, and achieved the second place, with an average F-Score of 78.52.
     The conventional Semantic Role Labeling system has a labeling generation procedure exploiting only the local information of a role, concerning only the context of the role itself and the relationship between the role and the predicate under consideration, while missing support for the global information, i.e. information concerning relationship and context of all the roles of a certain predicate simultaneously. A re-ranking system was introduced, which does the re-ranking with log-linear model trained with an online passive-aggressive algorithm among candidates generated by a beam search algorithm with respect to the output probabilities of the role recognition and classification procedure of the CoNLL2008 system. The final performance has got a further gain of 0.2% by means of the new re-rank procedure.

引文

1 E. Charniak and Y. Wilks. Computational Semantics. Amsterdam: North-Holland, 1976
    2 R. C. Schank. Conceptual Information Processing. Elsevier Science Inc., 1975
    3 C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. Cambridge, Massachusetts: The MIT Press, 1999
    4 M. E. Califf, R. J. Mooney. Relational Learning of Pattern-match Rules for Information Extraction. Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing. Menlo Park, CA, 1998:6-11.
    5 D. Gildea, D. Jurafsky. Automatic Labeling of Semantic Roles. Computational Linguistics. 2002, 28(3):245–288
    6 J. Allen. Natural Language Understanding (Second Edition): The Benjamin / Cummings Publishing Company, Inc. 1995
    7马金山.基于统计方法的汉语依存句法分析研究.哈尔滨工业大学博士学位论文. 2007: 1-1
    8 B. Sabine and M. Erwin. CoNLL-X shared task on Multilingual Dependency Parsing. Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X), New York City. 2006: 149-164
    9 J. Nivre, J. Hall, S. Kubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. The CoNLL 2007 Shared Task on Dependency Parsing. Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL, Prague. 2007: 915-932
    10 M. Surdeanu, S. Harabagiu, J. Williams, et al. Using Predicate-argument Structures for Information Extraction. Proceedings of ACL 2003. 2003
    11于江德,樊孝忠,庞文博.事件信息抽取中语义角色标注研究.计算机科学. 2008, 35(3):155–157
    12 S. Narayanan, S. Harbabagiu. Question Answering Based on Semantic Structures. Proceedings of Coling 2004. 2004
    13 D. Shen, M. Lapata. Using Semantic Roles to Improve Question Answering. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL). 2007:12-21.
    14 J. Hajic, M. Cmejrek, B. Dorr, et al. Natural Language Generation in the Context of Machine Translation. Tech. rep., Center for Language and Speech Processing, Johns Hopkins University, Baltimore, 2002
    15 M. W. Bilotti, P. Ogilvie, J. Callan, et al. Structured Retrieval for Question Answering. SIRIR’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA, 2007:351-358
    16 Y. W. Y. L. A. S. Gabor Melli, Zhongmin Shi, F. Popowich. Description of Squash, the Sfu Question Answering Summary Handler for the Duc-2006 Summarization Task. Proceedings of the Document Understanding Conference 2006 (DUC-2006). 2006
    17 D. Beaugrande, R. Alain and W. Dressler. Introduction to Text Linguistics. London; New York: Longman, 1981
    18 C. F. Baker, C. J. Fillmore, J. B. Lowe. The Berkeley FrameNet Project. Proceedings of the ACL-Coling-1998. 1998:86-90.
    19 M. Palmer, D. Gildea, P. Kingsbury. The Proposition Bank: An Annotated Corpus of Semantic Roles. Comput. Linguist. 2005, 31(1):71-106
    20 A. Meyers, R. Reeves, C. Macleod, et al. The Nombank Project: An Interim Report. A. Meyers, (Editor) HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation. Boston, Massachusetts, USA, 2004:24-31
    21 M. Marcus, B. Santorini, and M. Marcinkiewicz. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics, 1993, 19(2): 313-330
    22 S. Pradhan, K. Hacioglu, V. Krugler, et al. Support Vector Learning for Semantic Argument Classification. Machine Learning Journal, 2005, 60(1-3):11-39
    23 N. Kwon, M. Fleischman and E. Hovy. Senseval Automatic Labeling of Semantic Roles Using Maximum Entropy Models. R. Mihalcea, P. Edmonds, (Editors) Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain: Association for Computational Linguistics, 2004, 129-132
    24 P. Koomen, V. Punyakanok, D. Roth, et al. Generalized Inference with Multiple Semantic Role Labeling Systems. In Proceedings of CoNLL-2005,2005, 181-184
    25 V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,Berlin, 1995
    26 T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. C. N′edellec, C. Rouveirol, (Editors) Proceedings of ECML-98, 10th European Conference on Machine Learning, 1398, Chemnitz, DE: Springer Verlag, Heidelberg, DE, 1998, 137–142
    27 C. Cortes and V. Vapnik. Support Vector Networks. Machine Learning, 1995, 20:273– 295
    28 A. L. Berger, S. A. Della Pietra and V. J. Della Pietra. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics. 1996, 22(1):39–71
    29 X. Carreras and L. M`arquez. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, Michigan: Association for Computational Linguistics, 2005, 152-164
    30 M. Fleischman, N. Kwon and E. Hovy. Maximum Entropy Models for FrameNet Classification. M. Collins, M. Steedman, (Editors) Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 2003, 49-56
    31 A. J. Carlson, C. M. Cumby, N. D. Rizzolo, et al. SNoW User Manual. In Proceedings of CoNLL-04, 2004
    32 R. E. Schapire and Y. Singer. Improved Boosting Algorithms Using Confidencerated Predictions. Mach. Learn. 1999, 37(3):297–336
    33 M. Surdeanu and J. Turmo. Semantic Role Labeling Using Complete Syntactic Analysis. Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, Michigan: Association for Computational Linguistics, 2005, 221–224
    34 Mihai Surdeanu, Richard Johansson, Adam Meyers, Lluis Marquez and Joakim Nivre. The CoNLL-2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies. Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL-2008), 2008
    35 X. Carreras and L. Màrquez. 2004. Introduction to the CoNLL-2004 shared task: Semantic role labeling. In Proceedings of CoNLL 2004
    36 T. K. Sang, F. Erik, and S. Buchholz, in Proceedings of CoNLL-2000 and LLL-2000, edited by C. Cardie, W. Daelemans, C. Nedellec, and Tjong (Lisbon, Portugal, 2000), pp. 127-132
    37 Erik F. Tjong Kim Sang and HervéDéjean, Introduction to the CoNLL-2001 Shared Task: Clause Identification. In: Proceedings of CoNLL-2001, Toulouse, France, 2001.
    38 Erik F. Tjong Kim Sang, Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 155-158.
    39 Erik F. Tjong Kim Sang and Fien De Meulder, Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 142-147.
    40 Roberto Navigli. Word Sense Disambiguation: A Survey, ACM Computing Surveys, 41(2), 2009, pp. 1-69
    41 Yuhang Guo, Wanxiang Che, Yuxuan Hu, Wei Zhang and Ting Liu. HIT-IR-WSD: A WSD System for English Lexical Sample Task. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, 2007: pp. 165-168
    42 V. Chvatal, Linear Programming (Series of Books in the Mathematical Sciences) (W. H. Freeman, 1983), ISBN 0716715872
    43 J. T. Linderoth, T. K. Ralphs. Noncommercial Software for Mixed-Integer Linear Programming. Technical Report 04T-23, Department of Industrial and Systems Engineering, Lehigh University. 7-8
    44 Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245–288.
    45 Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James Martin, and Dan Jurafsky. 2004. Shallow semantic parsing using support vector machines. In Proceedings of HLT/NAACL-2004.
    46 Cynthia A. Thompson, Roger Levy, and Christopher D. Manning. 2003. A generative model for semantic role labeling. In Proceedings of ECML-2003.
    47 Michael Collins. 2000. Discriminative reranking for natural language parsing. In Proceedings of ICML-2000.
    48 K. Crammer, O. Dekel, S. S. Shwartz, and Y. Singer, Online passive-aggressive algorithms, 2003

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700