用户名: 密码: 验证码:
基于数据库语义学的古汉语句法语义分析研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
中文自然语言处理多以现代汉语为研究对象,但古汉语作为中国传统文化的重要载体,其自动句法和语义分析研究对于中国传统文化的西传具有重要意义。关于古汉语的自动句法和语义分析研究也会促进现代汉语自动分析的发展和进步。
     现代汉语的自动句法和语义分析取得了很大进步,但当前通用语法和句法分析器在应用于古汉语句法分析和语义表示时存在一定不足。本文以数据库语义学为理论框架,以其重要组成部分-—结合语法为技术支持,基于时间线性原则和描述性原则,以《左传》及其英译本为语料来分析古汉语的基本句法和语义关系。本文的研究工作主要分为如下几个方面:
     第一,根据数据库语义学句法分析的具体要求,以《左传》及其英译本为语料来源,建立一部双语词典。词条的存储形式为“命题粒”,“命题粒”是一种非递归特征结构,是一个属性值对的集合。词的语法信息和语义信息分别作为其特征结构中相应属性的值进行详细标注,满足基于数据库语义学和左结合语法句法语义分析需要的同时,从词的层次上降低由一词多义和词类活用引起的歧义发生率。
     第二,从自然语言处理的角度出发,对《左传》的语法特点、句式结构重新进行梳理,并结合基于左结合语法的自动句法分析的需要,总结归纳基本句法规则,从名词、动词、形容词三大词类的基本用法出发,研究和分析了主谓结构、联合结构、动宾结构等基本结构,以及宾语前置、语义被动、形式被动和成分省略等基本结构的变体形式。
     第三,在虚词处理方面,提出有条件移植法。在实词吸收虚词的过程中,有条件保留虚词的核心属性值和/或语义属性值。这和数据库语义学原来的处理方法不同,能够避免语言生成和机器翻译过程中可能出现的大量回溯。
     第四,在句法分析过程中透过表层结构挖掘语言内容,分析深层次的语义关系和语用内涵,通过规则操作的方法为词的语义属性添加新值,以表现其施事、受事、历事、修辞、被动等语义角色和语用功能。
     我们期待基于改进的数据库语义学的自动句法语义分析方法能够在今后应用于其他大规模语料的研究和分析上,比如和《左传》生成年代不同且具有不同语法特征的古汉语文本。另外,基于本文研究基础上的语言生成和机器翻译也是我们后续研究的方向之一。
The current natural language processing of Chinese is mainly concerned with modern Chinese. However, ancient Chinese is also a carrier of traditional Chinese culture. Automatic syntactic and semantic analysis of ancient Chinese is significant to the introduction of traditional Chinese culture into the world. The automatic syntactic and semantic analysis of ancient Chinese will also promote that of modern Chinese.
     Though the automatic syntactic and semantic analysis of modern Chinese has made a lot of progress, the commonly applied grammars and the existing parsers are not satisfactory when applied to ancient Chinese syntactic and semantic analysis. Supported by the theory of Database Semantics, this research is based on Left-Associative Grammar. Following the principle of time-linearity and description, this research is conducted on the basic syntactic and semantic relations in ancient Chinese with the original text and the English translation of Zuo Zhuan as corpus. This research is composed of the following parts.
     First, we have extracted data from the bilingual corpus, self-made with the original text and the English translation of Zuo Zhuan, to build up a lexicon, which meets the demand of Database Semantics-based syntactic-semantic analysis. In the lexicon, a word is stored as a "proplet" that is a non-recursive feature structure, i.e. a set of attribute-value pairs. Values of the attributes in a proplet represent the lexical information and semantic information of the word. As required by the syntactic and semantic analysis, this kind of data structure helps to reduce ambiguity caused by polysemy and temporary shift of part of speech.
     Secondly, we have generalized the grammatical features and sentence patterns of Zuo Zhuan from the perspective of natural language processing and composed basic syntactic-semantic rules to facilitate our syntactic and semantic analysis based on Database Semantics and Left-Associative Grammar. Our analysis covers the structures of subject-predicator, coordination, predicator-object, etc, as well as the variants of these basic structures, including object-fronting, semantic passive, formal passive, element omission and so on, within the fundamental application scope of nouns, verbs and adjectives.
     Thirdly, we have proposed the algorithm of conditional transplantation regarding function words and other words in auxiliary position. In the absorption of a function word by a content word, the core value and/or the semantic value of the function word is maintained under certain conditions. As different from the complete absoption in the original Database Semantics, it helps to avoid possible backtracking to a large extent in later language production and machine translation.
     Fourthly, we represent language content rather than superficial structures in the derivation. We analyze deep-level semantic relations and pragmatic meanings, which are then represented as new values of the semantic attribute of of the corresponding word. The additional values, such as agent, patient, experiencer. passive, rhetorical, are provided during the rule operations to indicate semantic roles and pragmatic functions.
     We expect larger-scale application of the automatic syntactic-semantic analysis based on the improved Databse Semantics, for example, to ancient Chinese texts that are produced in a different era and therefore have different features from that of Zuo Zhuan. Language production and machine translation based on this research may also be a focus of our future research.
引文
[1]Nida E A. Translating Meaning. San Dimas, California:English Language Institute,1982.
    [2]方梦之.泽学词典[M].上海:上海外语教育出版社,2004.
    [3]朱德熙.语法讲义[M].北京:商务印书馆,1984.
    [4]王力.古代汉语[M].北京:中华书局,1984.
    [5]马建忠.马氏文通[M].北京:商务印书馆,1983.
    [6]吕叔湘,朱德熙.语法修辞讲话[M].北京:中国青年出版社,1979.
    [7]管燮初.左传句法研究[M].合肥:安徽教育出版社,1994.
    [8]中小龙.中国句型文化[M].沈阳:东北师范大学出版社,1988.
    [9]杨伯峻,何乐十.古汉语语法及其发展(修订本)[M].北京:语文出版社,1989.
    [10]何乐士+.古汉语语法研究论文集[M].北京:商务印书馆,2000.
    [11]郭锡良,唐作潘,何九盈.古代汉语[M].北京:商务印书馆,1999.
    [12]周秉钧.古汉语纲要[M].长沙:湖南人民出版社,1981.
    [13]蒲立本.古汉语语法纲要[M].北京:语文出版社,2006.
    [14]宗成庆.统计自然语言处理[M].北京:清华大学出版社,2008.
    [15]冯志伟.自然语言处理的形式模式[M].北京:中国科学技术大学出版社,2010.
    [16]孟遥,李生,赵铁军,曹海龙.四种基本统计句法分析模型在汉语句法分析中的性能比较[J].中文信息、学报,2003,17(03):1-8.
    [17]赵铁军,李生,周明.一种生成复杂特征集句法树的汉语句法分析方法与系统实现[J].中文信息学报,1992,6(4):11—24.
    [18]周会平,王挺,陈火旺.用LR算法分析汉语的语法关系[J].软件学报,1999,10(9):967-973.
    [19]周强,黄吕宁.基于局部优先的汉语句法分析方法[J].软件学报,1999,10(1):1-6.
    [20]杨开城.一种基于句法语义特征的汉语句法分析器[J].中文信息学报,2000,14(3):46—53.
    [21]苑春法,陈刚,黄吕宁.基于性和语义知识的汉语句法规则学习[J].中文信息学报,2001,15(3):1—8.
    [22]王鹏,戴新宇,陈家骏,王启祥.基丁规则的汉语句法分析方法研究[J].计算机工程与应用,2003,39(29):63-66.
    [23]刘海涛,赵怿怡.基于树库的汉语依存句法分析[J].模型识别与人工智能,2009,22(1):17-21.
    [24]胡玥,高小宁,李莉,高庆狮.自然语言合理句子的生成系统[J].计算机学报,2010,33(3):535—544.
    [25]代印唐,吴承荣,马胜祥,钟亦平.层级分类概率句法分析[J].软件学报,2011,22(2):245—257.
    [26]赵铁军.机器翻译原理[M].哈尔滨:哈尔滨工业大学出版社,2000.
    [27]吴保民,郭永辉,王炳锡.英汉机器翻译中基于规则的句子结构分析与转换[J].信息工程大学学报,2007,8(1):9—13.
    [28]詹卫东.面向中文信息处理的现代汉语短语结构规则研究[M].北京:清华大学出版社,2000.
    [29]王荣波,周昌乐,池哲儒.一种基于规则转换的机器翻译方法初探[J].计算机工程与应用.2004,30(20):133-135.
    [30]Chuich K. A stochastic parts program and noun phrase-parser for unrestricted text [A]. In Proceedings of the Second Conference on Applied Natural Language Processing,1988.
    [31]Magerman D., Marcus M.1990. Parsing a natural language using mutual information statistics. In Proceedings of AAAI'90,1990.
    [32]张国炕,郁梅,于小华.基于语料库的汉语边界划分的研究[A].陈力为,袁琦主编.计算语言学进展与应用,北京:清华大学出版社,1995.
    [33]Collins M. A new statistical parser based on bigramlexical dependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics.1996.
    [34]Abney S. Parsing by chunks[A]. In Berwick R, Abney S., Tenny C. eds., Principle-Based Parsing. Dordercht:Kluwer Academic Publishers,1991.
    [35]孙宏林.从标注语料库中归纳语法规则:V+N序列实验分析[A].陈力为,袁琦主编.语言工程.北京:清华大学出版社,1997.
    [36]刘长征.基于词性标注语料库中普通名词序列的捆绑研究[A].中文信息处理国际会议论文集,1998.
    [37]赵军.汉语基本名词短语识别及结构分析.清华大学计算机系博十论文,1998.
    [38]赵军,黄吕宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2):1-8.
    [39]周强.一个汉语短语自动定界模刑[J].软件学报,1996,7(zk):315-322.
    [40]Collins M J. Three generative lexicalised models for statistical parsing[A]. In Proceedings of the 35th Annual Meeting of the ACL, Madrid, Spain,1997.
    [41]Bikel D M, Miller S, Schwartz R, Weischedel R. Nymble. A high-performance learning name-finder. In Grishman R. eds. Proceedings of the 5th Conference on Applied Natural Language Processing. Stroudsburg:Association for Computational Linguistics,1997, pp194-201.
    [42]Klein D., Manning C D. A* Parsing:Fast Exact Viterbi Parse Selection[A]. Proceedings of HLT-NAACL,2003, pp40-47.
    [43]Chomsky N. Syntactic Structures[M]. Mouton,1957.
    [44]Chomsky N. Aspects of the Theory of Syntax[M]. The MIT press,1965.
    [45]冯志伟.基于短语结构语法的自动句法分析方法[J].当代语言学,2000,2(2):84—98.
    [46]周强.汉语语料库的短语自动划分和标注研究.博十学位论文.北京大学,1996.
    [47]Saussure F. Course in general linguistics[M]. trans. Wade Baskin. London: Fontana/Collins,1974.
    [48]Hausser R. Complexity in left-associative grammar[J]. Theoretical Computer Science, 1992,106(2):283-308.
    [49]Hausser R. NEWCAT:Parsing Natural Language Using Left-Associative Grammar[M], Berlin, Heidelberg, New York:Springer,1986.
    [50]方林,程景云.一种适用于自然语言句子分析的自顶向下回溯算法[J].微型计算机,1994,14(6):20-23.
    [51]杨沐昀,赵铁军,于浩.自底向上的汉语句法标注体系设计与实践[A].自然语言理解与机器翻译,第七届全国计算语言学联合学术会议,2001.
    [52]Aho A V, Sethi R, and Ullman J D. Compilers:principles, techniques, and tools. Reading, MA,1986.
    [53]Tomita M. Efficient parsing for natural language:a fast algorithm for practical systems[M]. Kluwer Academic Pub,1985.
    [54]冯志伟.自然语言处理中的概率语法[J].当代语言学,2005,7(2):166一178.
    [55]冯志伟.计算语言学探索[M].哈尔滨:黑龙江教育出版社,2001.
    [56]Cocke J, Schwartz J T. Programming languages and their compilers:Preliminary notes. Technical report, Courant Institute of Mathematical Sciences, New York University, 1970.
    [57]Kasami T. An efficient recognition and syntax-analysis algorithm for context-free languages. Technical report, Air Force Cambridge Research Lab,1965.
    [58]Younger D H. Recognition and parsing of context-free languages in time n3[J]. Information and Control,1967,10(2):189-208.
    [59]Montague R. English as a formal language[J]. In Bruno Visentini, ed, Linguaggi nella societa enella tecnica. Milan:Edizioni di Comunita,1970a,189-224.
    [60]Gazdar G. Generalized phrase structure grammar[M]. Harvard University Press,1985.
    [61]冯志伟.中心语驱动短语结构语法[A].语言学问题集刊.吉林:吉林人民出版社,2001,186-206.
    [62]刘海涛.依存语法与机器翻译[J].语言文字应用,1997,(3):89—93.
    [63]Ajdukiewicz K. Die syntaktische konnexitat[J]. Studia philosophica,1935,1 (1):27.
    [64]Bar-Hillel Y. Language and Information-Selected Essays on Their Theory and Application[C]. Massachusetts:Addison Wesley and Jerusalem Academic Press,1964.
    [65]Lambek J. The mathematics of sentence structure [J]. The American Mathematical Monthly,1958,65(3):154-170.
    [66]Montague R. Universal grammar[J]. Theoria.1970b,36(3):373-398.
    [67]邹崇理.逻辑、语言和蒙太格语法[M].北京:社会科学文献出版社,1995.
    [68]冯志伟.范畴语法[J].语言文字应用.2001,(3):100—110.
    [69]翟成祥,王岩冰,张家重,徐家福.汉语组合类型语法[J].中文信息学报,1991,5(3):1—7.
    [70]Hausser R. Foundations of computational linguistics:human-computer communication in natural language[M]. Berlin, Heidelberg, New York:Springer,2001.
    [71]Bar-Hillel Y. On categorial and phrase structure grammars[J]. The Bulletin of the Research Council of Israel,1960,9F:1-16.
    [72]朱德熙.说的[A].现代汉语语法研究.北京:商务出版社,1982
    [73]黎锦熙.新著国语文法[M].北京:商务出版社,1954.
    [74]干力.中国现代语法[M].北京:商务出版社,1985.
    [75]胡裕树主编.现代汉语[M].上海:上海教育出版社,2011
    [76]黄伯荣,廖序东主编.现代汉语[M].北京:高等教育出版社,1991.
    [77]陈承泽.国文法草创[M].北京:商务出版社,1982.
    [78]陈望道.文法简论[M].上海:上海教育出版社,1978.
    [79]丁声树.现代汉语语法讲话[M].北京:商务出版社,1982.
    [80]吕淑湘.中国文法要略[M].北京:商务出版社,2011.
    [81]赵章界,白硕.短语结构制导的范畴表达式演算[J].中文信息处理,2005,19(2):12-19.
    [82]Montague R. English as a formal language [J]. In Bruno Visentini, ed, Linguaggi nella societa enella tecnica. Milan:Edizioni di Comunita,1970a,189-224.
    [83]Montague R. Universal grammar[J]. Theoria.1970b:36(3):373-398.
    [84]Dowty D R, Wall R E, Peters S. Introduction to Montague semantics[M]. Berlin, Heidelberg, New York:Springer,1981.
    [85]Tesniere L. Elements de syntaxe structurale[M]. Paris:Klincksieck,1959.
    [86]Gaifman H. Dependency systems and phrase-structure systems[J]. Information and Control,1965,8(3):304-337.
    [87]Lai B Y T, Huang C. Dependency grammar and the parsing of Chinese sentences. Proceedings of the 8th Joint Conference of ACLIC and 2"" PacFoCol,1994.
    [88]刘海涛.依存语法的理论与实践[M].北京:科学出版社,2009.
    [89]周惠巍,杨洋,黄德根.基于远距离依存关系的中文依存关系解析[J].计算机程.2007,33(24)212-214.
    [90]刘海涛,赵怿怡.基于树库的汉语依存句法分析.模式识别与人工智能[J].2009,22(1):17-21.
    [91]吕淑湘.汉语语法分析问题[M].北京:商务山版社,1986.
    [92]张先坦.古今汉语语法比较概要[M].成都:巴蜀书社,2007.
    [93]高更生.汉语语法研究[M].济南:山东人民出版社,2001.
    [94]http://ir.hit.edu.cn/,2011.11.1
    [95]http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html,2011.11.1
    [96]http://nlp.cs.berkeley. edu/,2011.11.1
    [97]http://nlp.stanford.edu/software/lex-parser.shtml,2011.11.1
    [98]http://maltparser. org,2011.11.1
    [99]Hausser R. A Computational Model of Natural Language Communication; Interpretation, Inference, and Production in Database Semantics[M]. Berlin, Heidelberg, New York: Springer,2006.
    [100]Handl J, Kabashi B, Proisl T, Weber C. JSLIM-Computational Morphology in the Framework of the SLIM Theory of Language[J]. Communications in Computer and Information Science.2009,41:10-27.
    [101]Weber C, Handl J, Reihl S, Greiner P. The JSLIM 2.1 Documentation:Morphology, Syntax and Formal Languages. Technical report. University of Erlangen,2010.
    [102]Handl J. Kapfer J. A time-incremental dependency parser using left-associative grammar. In:Rehm G, Witt A, Lemnitzer L, edited. Datenstrukturen fur linguistische Ressourcen und ihre Anwendungen, Proceedings of the Biennial GLDV Conference 2007, 71-79.
    [103]Hausser R. From word form surfaces to communication[J]. Proceedings of the 2010 conference on Information Systems,2010.
    [104]Hausser R. Handling valency and coordination in Database Semantics[J]. Trends in Linguistics Studies and Monographs.2007,187:321-338.
    [105]Hausser R. Treating quantifiers in Database Semantics[J]. New Frontiers in Artificial Intelligence and Applications,2007.
    [106]Oberhofer T. Automatische syntaxanalyse des chinesischen. Master's thesis, University of Erlangen,1992.
    [107]Mei H. Automatische syntax und semantikanalyse des chinesischen. Master's thesis, University of Erlangen,2007.
    [108]Hausser R. Left-associative grammar and the parser newcat. Technical report, Stanford University, Stanford/CA,1985.
    [109]Hausser R. Comparing the Use of Feature Structures in Nativism and in Database Semantics[A]. in Information Modelling and Knowledge Bases XIX. edited by Hannu Jaakkola, Yasushi Kiyoki and Takehiro Tokuda, Amsterdam:IOS Press Ohmsha.2007.
    [110]Hausser R. Turn taking in database semantics[A]. In Kangassalo Ⅱ, ed., Information Modeling and Knowledge Bases ⅩⅥ. Amsterdam:IOS Press Ohmsha,2005.
    [111]Hausser R. A Database Interpretation of Natural Language[J]. Korean Journal of Linguistics,1996,21 (1,2):29-55.
    [112]Hausser R. The Major Constructions of English Revisited. Erasmus lectures II. Univ. of Jyvaskyla, Jyvaskyla, Finland,2010.
    [113]Hausser R. Natural Language Production in Database Semantics, in PACLIC 24, Proceedings of the 24th Pacific Asia Conference on Language, Information, and Computation,2010.
    [114]Hausser R. Computational Linguistics and Talking Robots; Processing Content in Database Semantics[M]. Berlin, Heidelberg, New York:Springer,2011.
    [115]王力.中国现代语法[M].北京:商务出版社,1985.
    [116]张志公.汉语语法常识[M].北京:中国青年出版社,1953.
    [117]http://ccl.pku.edu.cn:8080/ccl_corpus/2011.11.1
    [118]Legge J.1872. The Ch'un Ts'ew with the Tso Chuen [M]. Oxford:Oxford University Press.
    [119]Wang C Y. Reviews on the Tso Chuan-Selection from China's Oldest Narrative History [J]. Chinese Literature:Essays, Articles, Reviews.1990, (12):152-154.
    [120]Watson B. The Tso Chuan-Selection from China's Oldest Narrative History [M], New York:Columbia University Press,1989.
    [121]于江生,俞士汶.中文概念词典的结构[J].中文信息学报,2002,(4):12—21.
    [122]http://www.keenage.com,2011.11.1
    [123]http://www.irlab.org,2011.11.1
    [124]俞士汶,朱学锋,王惠.《现代汉语语法信息词典》的新进展.[J].中文信息学报,2001,15(1):59-65.
    [125]王惠,詹卫东,俞十汶.现代汉语语义词典的结构及应用[J].语言文字应用,2006,(1):134—141.
    [126]Hausser R. Corpus linguistics, generative grammar, and database semantics[A]. In Herbst T, Faulhaber S, Uhrig P, edited. The Phraseological View of Language-A Tribute to Sinclair J. Berlin, New York:De Gruyter Mouton,2011.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700