用户名: 密码: 验证码:
语言的视觉语义表征及其在场景自动描述系统中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
词汇语义分析在自然语言处理中具有重要位置,已有的大多数词汇语义理论和分析技术中对语义的表示都建立在词汇关系的基础上,简单地说,就是用一组词来解释另一组词。这种方式在机器翻译,问答系统等很多领域中都得到了应用,但是在另一些应用中显得无能为力。比如,为图像生成语言描述,涉及实际情境对象的人机交互等。其主要问题就在于,语言没有和实际感知相关联。为连接语言和感知所出现的一个新的研究趋势是模仿人类的语言习得机制,构建基于各种感知信息的语言习得和计算模型。其中尤以视觉认知和语言认知在习得过程中的互动关系受到广泛关注。这一任务可称之为“基于视觉信息的语言习得”(vision grounded language acquisition)。该类研究将原有的基于语言表示语言意义的单一模态扩展到基于视觉信息进行语言意义表示,从而把语言概念和感知信息进行关联,实现基于实际情境对象的人机交互。
     另一方面,随着计算机技术和互联网的迅猛发展,各类文档、图像、视频等多媒体信息的增长十分迅猛,利用计算机来处理这些海量式的非结构信息的需求也变得越来越迫切。在这样一个背景下,本文主要针对视觉信息与语言信息的关联过程,表征方式与习得方法展开研究,主要的工作及创新包括以下几个方面:
     1.静态词类的视觉特征与语言词汇的关联研究
     名词,形容词作为在人类语言习得过程中首先被习得的词汇,具有直接指代外部世界实体感知特征的性质,其视觉信息可由静态的图像所包涵,因此可被认为属于静态词类。本文对静态词类的语义习得研究,主要通过构建计算模型ViMac,实现图像的视觉特征和描述词汇之间的自动关联。ViMac的习得可分为四步:双通道信息的预处理,基于Hellinger距离的语义关联度矢量计算,基于混合度量的词聚类,基于多维Hellinger距离的视觉特征选择。通过上述四个模块实现视觉特征和语言词汇由底层到高层的对应。在上述模块中,对视觉特征分布距离的度量选择是影响学习效果的关键。为此我们比较了Kullback-Liebler距离和Hellinger距离的单维和多维形式在语义关联度矢量计算以及视觉特征选择中的不同效果。实验结果证明,Helliner距离可以显著的改善静态词类的视觉特征与词汇关联的结果。
     2.静态词类视觉语义表征方案与语言输出算法的研究
     静态词汇与视觉特征直接关联后,词汇语义在视觉空间中可有不同的表征形式。当ViMac对图像生成语言描述时,这些视觉语义表征形式对输出算法和输出性能有着不同程度的影响。基于此,本文提出并研究了针对静态词类的三种视觉语义表征方案:分别是基于高斯模型的视觉语义表征,基于K近邻算法的视觉语义表征和基于核心成分的视觉语义表征。其中,基于核心成分的视觉语义表征借鉴了人类表征语义时的使用语义中心与边缘的特性,基于此设计了复合词生成算法。该算法可生成训练数据中未学习到的复合词,从而在评测时对新的视觉场景也可生成相应的语言描述,克服训练语料稀疏性。输出语句的自动评测采用BLEU技术,对基于三种语义表征的语言输出算法的进行对比表明,复合词算法能够生成在预定义词集中未能习得的新词,克服训练语料在标注中存在的主观差异性,提高输出算法的计算效率,因此在整体性能上优于其他两种算法。同时,对复合词算法本身的实验研究也揭示了人类在核心词与复合词使用上的不同规律。
     3.动态词类的视觉语义表征研究
     动词作为人类语言习得过程中后习得的词汇,具有一定的复杂性,其意义的解释需要名词、副词等基础词的参与。其语义多指代一个动作事件,可包涵于动态的视频中,因此我们将其归属于动态词类。针对动词的这些特性,本文首先从语言学上基于框架语言学规定动词语义表达的结构,包含框架和论元两部分。其中框架是用来组织情境知识的认知结构,而论元则被框架支配来实现对具体情境的描述。基于该动词语义定义构建基于视频信息的动词语义习得模型ViMac-V。 ViMac-V的视觉通道信息和语言通道信息均要复杂于静态词类习得模型ViMac,尤其是在对语言通道信息的框架和论元提取工作上。ViMac-V首先采用视觉特征与词共现的方法进行分类基础词的选择,再基于词性与最小编辑距离的词度量进行论元词类的划分。在得到各组论元词类的基础上,利用二元语法模型进行动词框架的提取。实验证明了ViMac-V对框架和论元成分提取的有效性,共习得有关7个动词的5组框架和4组论元词类(62个论元词汇)。
     4.动态词类语义表征与视频信息的关联研究
     在ViMac-V中,动词语义与视频信息的关联主要通过构建自组织神经网络组来实现。其中,基于学习矢量量化的框架激活机制将视频所凸显的认知视角与动词框架相关联;而对论元词汇则通过SOM网络训练,神经元聚类和语言概念习得将其在视觉空间中实现范畴化。范畴化后的SOM网络连接了高维视频特征分布和论元词汇,再通过框架支配各个SOM子网络联结为不同的形式,组成不同的动词视觉语义。完成的ViMac-V模型被部署在MT-AR型机器人平台上。采用摄像头和语音输出来扩展ViMac-V的视觉和语言感知能力。同时还设计了基于框架与论元共现率的动词输出选择算法,用来生成更贴近视频场景的自然语言描述。在对真实动态场景的描述语言输出实验结果表明,ViMac-V所习得的动词语义表征可以对真实场景下的小球运动事件生成正确的自然语言描述。
Lexical semantic analysis is an important research topic in Natural Language Processing (NLP). In most existing theories and technologies, representations of semantics are based on relations between words or concepts. Briefly, it is to explain one word conceptually by some other words or relations with other words. This type of semantics has been widely applied in many fields such as machine translation and question answering systems. However, it can do little in some other tasks, for example, situated human-machine interaction, automatic text description for images and so on. The main reason is that the linguistic words have no relations with perceptive information in this type of semantics. To link language and perception, a new trend in NLP research appears to imitate human language acquisition mechanism. And new computational models are built to learn semantics from various sensorimotor information, among which, vision cognition and its relationship with language ability have gained special attention. This task is named Vision Grounded Language Acquisition. Language grounding research can extend original mono-modal language representation to vision-language association based method. Thus language concepts will be associated with sensorimoter information in order to realize the human-machine interaction under real circumstances.
     In another aspect, along with the fast developments of computer science and internet, multimedia informations such as various documents, images and videos are dramatically increasing. The demands that process these massive non-structure informations with computers become more and more urgent. In such a background, this dissertation mainly focus on the association process, representation methods and acquirement algorithems between visual information and langauge information. The main works and innovations are summarized as follows:
     1. Research on association between visual features and static lexicons
     As the first acquired lexicons by humans, nouns and adjectives can be directly refered to the sensored features of objects in the real world. Their visual information can be included in static images. Thus they can be classified as static lexicons. In this dissertation we borrow the idea of children language acquisition and build a learning model ViMac to automatically associate the informations between visual modal and langaugae modal. ViMac is constructed by four modules, which are dual-modal information preprocess and feature extraction, Hellinger distance based semantic association vector computation, hybrid metric based word clustering and multi-Hellinger distance based visual feature selection. Through these modules the correspondances between visual features and language lexicons can be sorted from the bottom to the top. In above learining modules, the distance measuring the divergences between distributions of visual features is the key to the learning effects. Thus the different learning results in semantic association vector computation and visual feature selection are compared with those distances among one-dimensional Kullback-Liebler distance, one-dimensional Hellinger distance, multi-dimensional Kullback-Liebler distance and multi-dimensional Hellinger distance. Experimental results prove that one-dimensional Hellinger distance and multi-dimensional Hellinger distance can significantly improve the association results between visual features and static lexicons.
     2. Research on semantic representation schemes and language output algorithm
     After the association between static lexicons and visual features, lexical semantics can have various representations in visual sub spaces. When ViMac uses these acquired lexical semantics to generate language description for images, they will have different effects to output algorithms and describing performances. Thus this dissertation proposes three visual semantic representations on static lexicons, which are Gaussian based representation, KNN based representation and Core-based representation. In these representations, Core-based method benefit from the cognitive science research that human language representations can be divided into two parts of center and edge. Based on Core-based representation a novel compound generation method is proposed. Compound method can overcome the data sparse problem, generate the unlearned compounds from training sets and output corresponding descriptions for new scenes during test. The automatic evaluation on output sentences is based on BLEU technology. The comparisional experiments among three representations based output algorithms are implemented. The results show that compound generation method can generate the unseen new words from predefined word set, overcome the subjective variabilites existing in training data and significantly improves the computation efficiency. Thus its overall performance is far superior than other two algorithms. Meanwhile, the experiments results on Compound method itself also reveals the different rules on the usages between core-words and compounds by human.
     3. Reserch on visual semantic representation of dynamic lexicons.
     As the later learned lexicons in human language process, verb has certain degrees of complexities. The explainations to its meanings need the participation of the basic lexicons such as nouns and adverb. The semantic of verb often refers to a action event, which can be included in a dynamic video. Thus the verb can be classified into dynamic lexicon. Aiming on the verb complexity, the structure of verb semantic representation based on frame semantics is first defined, which includes two parts of frame and arguments. In this representation, the frame can be regarded as a cognitive model that organizes situational knowledge related to the linguistic context. Then a detailed description can be realized through the selection of various members categorized by different arguments. With this representation a video information based verb semantic acquisition model ViMac-V is constructed. The informations in visual modal and language model in ViMac-V are both complex than they are in ViMac, especially on extraction of frame and arguments from language modal. ViMac-V first uses the method based on the cooccurrences between visual features and lexicons for the selection of the basic classification words. Then a hybrid word measurment based on POS information and minimal edit distance is used for the argument lexicons classification. After the acquisition of each group of argument lexicons, bi-gram model is used for extraction of verb frames. Experimental results prove the effectiveness of extraction on frames and arguments by ViMac-V. There are total5groups of frames and4groups of arguments (62lexicons) related to7verbs are learned by ViMac-V
     4. Research on the association between video information and semantic representation of dynamic lexicons
     In ViMac-V, the association between video information with frames and arguments is realized through the construction of Self-organizing network groups. The association between verb frames and cognitive perspectives highlighted by video information is realized through frame activation mechanism based on Learning Vector Quantization algorithm. The arguments lexicons dominated by verb frames are categorized in visual spaces through SOM network training, neuron clustering and language concept acquisition. SOM connects the distribution of high-dimensional video features and argument lexicons. Each SOM can be linked by frame into various sub networks to express different verb semantics. The completed ViMac-V can be set up on the MT-AR robot platform. MT-AR uses the camera and speech output to extend the visual and langauge abilities of ViMac-V. Meanwhile, a verb selection algorithm based on the cooccurrences between frame and argument is proposed to generate the natural language descriptions which are more closed to the video scenes. In the experiments for description output test show that the visual representation acquired by ViMac-V can be used to generate correct natural language description for small ball movement events under complex real circumstances.
引文
1. Maedche, A.S., S.;, Ontology learning for the Semantic Web. Intelligent Systems, IEEE,2001.16(2):p.72-79.
    2. AnHai Doan, J.M., Pedro Domingos, Alon Halevy Learning to map between ontologies on the semantic web, in Proceedings of the 11th international conference on World Wide Web.2002:ACM:Honolulu, Hawaii, USA. p. 662-673.
    3. Chen, K., et al., Semantic Anchoring with Model Transformations Model Driven Architecture-Foundations and Applications, A. Hartman and D. Kreische, Editors.2005, Springer Berlin/Heidelberg, p.115-129.
    4. McRae, K., et al., Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods,2005.37(4):p.547-559.
    5. Davidson, D., Truth and meaning. Synthese,1967.17(1):p.304-323.
    6. Epstein, R.L., The semantic foundations of logic.1994:Oxford University Press (New York).
    7. Atencia, M. and M. Schorlemmer, A formal model for situated semantic alignment, in Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems.2007, ACM:Honolulu, Hawaii. p. 1-8.
    8. NikRavesh, M., Fuzzy conceptual-based search engine using conceptual semantic indexing, in Fuzzy Information Processing Society,2002. Proceedings. NAFIPS.2002 Annual Meeting of the North American 2002. p. 146-151.
    9.董振东,董强,and郝长伶,知网的理论发现.中文信息学报(Journal of Chinese Information Processing),2007.4.
    10.董强,郝长伶,and董振东,基于知网的中文结构排歧工具——VXY.中文信息学报(Journal of Chinese Information Processing),2010.1.
    11.陆汝占and靳光瑾,现代汉语研究的新视角.语言文字应用(Applied Linguistics),2004.2.
    12.胡熠,et al.,基于语言建模的文本情感分类研究.计算机研究与发展,2007.9.
    13.黄曾阳,HNC理论概要.中文信息学报(Journal of Chinese Information Processing),1997.4.
    14.黄曾阳,HNC理论与自然语言语句的理解.中国基础科学(China Basic Science),1999.1.
    15.鲁川,缑瑞隆,and董丽萍,现代汉语基本句模.世界汉语教学(Chinese Teaching in the World),2000.4.
    16.鲁川,信息处理用汉语句子语序的认知研究,in辉煌二十年——中国中文信息学会二十周年学术会议论文集(Proceedings of Conference of the 20th Anniversary of CIPSC),中文信息学会,Editor.2001:中国,北京.
    17. Posner, M.I., ed. Foundations of cognitive science. Sixth printing,1998. First MIT paperback edition,1993, MIT Press.
    18. Barbara Landau, L.R.G., Language and Experience:Evidence from the Blind Child.1988:arvard University Press.
    19. Tall, D. and S. Vinner, Concept image and concept definition in mathematics with particular reference to limits and continuity. Educational Studies in Mathematics,1981.12(2):p.151-169.
    20. Ludwig, W., ed. Philosophical Investigations. ed. n.e. transl by G E. M. Anscobe.1958, Oxford:Blackwell.
    21. FODOR, J.A., THE LANGUAGE OF THOUGHT.1975:Paperback, Harvard University Press.
    22. Frege, G, On sense and reference. Translations of the Philosophical Writings of Gottlob Frege, ed. E. In P.Geach and M. Black:In P.Geach and M. Black, Eds. Translations of the Philosophical Writings of Gottlob Frege. Oxford: Blackwell.
    23. Ogden, C.K.a.R., I. Aristotle, ed. The meaning of meaning:A Study of the Influence of Language upon Thought and of the Science of Symbolism..1989, Harcourt, Cambridge,MA, reissue edition.
    24. Searle, J., Minds, Brains and Programs. Behavioral and Brain Sciences,1980. 3(3):p.417-457.
    25. Palmer M, K.P., Gildea D, The Proposition Bank:An Annotated Corpus of Semantic Roles. Computational Linguistics,2005.31(1):p.71-106.
    26. Sameer S. Pradhan, E.H.H., Mitchell P. Marcus, Martha Palmer, Lance A. Ramshaw, Ralph M. Weischedel, Ontonotes:a Unified Relational Semantic Representation. Int. J. Semantic Computing,2007.1(4):p.405-419.
    27. Collin F. Baker, C.J.F., and John B. Lowe. The Berkeley FrameNet project. in Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics.1998. San Francisco, California:Morgan Kaufmann Publishers.
    28.车万翔,语义分析调研报告.2005,哈尔滨工业大学信息检索研究室.
    29. Fellbaum, C., editor, ed. Wordnet:An Electronic Lexical Database.1998, MIT Press, Cambridge, MA.
    30. Cangelosi, A. and S. Harnad, The Adaptive Advantage of Symbolic Theft Over Sensorimotor Toil:Grounding Language in Perceptual Categories. Evoluation of Communication,2001.4(1):p.117-142.
    31. Harnad, S., Minds, Machines and Turing:The Indistinguishability of Indistinguishables. Journal of Logic, Language, and Information,2000. 9(4(Special Issue on "Alan Turing and Artificial Intelligence")):p.425-445.
    32. Harnad, S., Computation Is Just Interpretable Symbol Manipulation: Cognition Isn't. Minds and Machines. Special Issue on "What Is Computation", 1994(4):p.379-390.
    33. Harnad, S., The Symbol Grounding Problem. Physica D,1990.42:p.335-346.
    34. Harnad, S., Handbook of Categorization:To Cognize is to Categorize: Cognition is Categorization.2005:Elsevier.
    35. Harnad, S., The Annotation Game:On Turing (1950) on Computing, Machinery and Intelligence. In:Epstein, Robert & Peters, Grace (Eds.) The Turing Test Sourcebook:Philosophical and Methodological Issues in the Quest for the Thinking Computer.2007:Kluwer.
    36.钟义信,人工智能理论:从分立到统一的奥秘.北京邮电大学学报,2006.29(9):p.1-6.
    37.钟义信,知行学引论—信息知识智能的统一理论.中国工程科学,2004.6(6).
    38. Pylyshyn, Z.W., Computation and cognition.1984:Cambridge MA: MIT/Bradford.
    39. Turing, A.M., Computing Machinery and Intelligence.2009:Springer Netherlands.
    40. Langford, L.v.A.M.B.J., Telling humans and computers apart automatically. Communications of the ACM, February 2004.47(2).
    41. Moor, J., The Turing test:the elusive standard of artificial intelligence.2003: Kluwer Academic Publishers.
    42. Copeland, B.J., The Essential Turing:Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence and Artificial Life:Plus The Secrets of Enigma.2004:OXFORD UNIVERSITY PRESS.
    43. Zhong, Y., A Cognitive Approach to the Mechanism of Intelligence. The International Journal of Cognitive Informatics and Natural Intelligence,2008. 2(1):p.1-16.
    44. Zhong, Y. A Cognitive Approach to Artificial Intelligence Research. in IEEE International Conference on Cognitive Informatics 2006.
    45.钟义信,机制主义:人工智能的统一理论.电子学报,2006.2.
    46. Taylor, J.R., Cognitive Grammar.2002:Oxford:Oxford University Press.
    47. Taylor, J.R., Linguistic Categorization,3rd edn.2003:Oxford:Oxford University Press.
    48. Chomsky, N., Aspects of the Theory of Syntax.1969:The MIT Press.
    49. Chomsky, N., Syntactic structures.1957:Oxford, England:Mouton.
    50. Winograd, T., Procedures as a Representation for Data in a Computer Program for Understanding Natural Language, in MIT AI Technical Report 235. February 1971.
    51. Winograd, T., Understanding Natural Language.1972:Academic Press.
    52. Siskind, J., Naive Physics, Event Perception, Lexical Semantics, and Language Acquisition, in PhD thesis, Massachusetts Institute of Technology. 1992.
    53. Siskind, J., Grounding language in perception. Artificial Intelligence Review, 1995.8:p.371-391.
    54. Siskind, J.M., Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic. Journal of Artificial Intelligence Research,2001.15:p.31-90.
    55. Wazinski, G.H.a.P., VIsual TRAnslator:Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review,1994.8:p.175-187.
    56. Steels, L.a.K.., F, AIBO's first words:The social learning of language and meaning. Evolution of Communication,2001.4(1):p.3--32.
    57. Pangburn, B.E., Iyengar, S. Sitharama, Mathews, Robert C., Ayo, Jonathan P. EBLA:a perceptually grounded model of language acquisition. in Human Language Technology Conference. Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data.2003 Morristown, NJ, USA:Association for Computational Linguistics
    58. B. Bergen, N. Chang, and S. Narayanan. Simulated action in an embodied construction grammar. in In Proceedings of the 26th Annual Meeting of the Cognitive Science Society.2004.
    59. S. Narayanan Moving right along:A computational model of metaphoric reasoning about events. in In Proceedings of the National Conference on Artificial Intelligence AAAI-99.1999. Orlando, FL.
    60. Carlson, T.R.a.L., Grounding spatial language in perception:An empirical and computational investigation. Journal of Experimental Psychology,2001. 130(2):p.273-298.
    61. Pentland, D.R.a.A., Learning words from sights and sounds:A computational model. Cognitive Science,2002.26(1):p.113-146.
    62. Roy, D.K., Learning visually-grounded words and syntax for a scene description task. Computer Speech and Language,2002.16(3):p.353-386.
    63. Deb Roy, Kai-yuh Hsiao, and N. Mavridis, Mental imagery for a conversational robot. IEEE Transactions on Systems, Man, and Cybernetics, Part B,2004.34(3):p.1374-1383.
    64. Michael Fleischman, D.R. Unsupervised content-based indexing of sports video. in International Multimedia Conference, Proceedings of the international workshop on Workshop on multimedia information retrieval,Session:Video retrieval 2007. Augsburg, Bavaria, Germany ACM New York, NY, USA.
    65. Jane Gillette, H.G., Lila Gleitman*, Anne Lederer,.73:p.135-176., Human simulations of vocabulary learning.Cognition.1999.
    66.张春宇,张蔚,刘海鹏,于立平,王小捷,李睿凡,基于视觉信息的汉语词汇语义习得,in中国第十届计算语言学学术会议.2009.
    67. Ying Qin, S.Z., Xiaojie Wang Combining Multi-knowledge for Chinese Word Segmentation Disambiguation. in Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications.2006:IEEE Computer Society Washington, DC, USA.
    68. Hu, M.-K., Visual pattern recognition by moment invariants. IRE Transactions on Information Theory,1962.8(2):p.179-187.
    69. M.Kamber, J.H.a.,数据挖掘:概念与技术(韩家炜译).2000.
    70. Hershey, J.R.O., P.A.; IBM Thomas J. Watson Res. Center, NY Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models. in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2007).2007. Honolulu, HI.
    71. Cordero-Brana, A.C.a.O.I., Minimum Hellinger Distance Estimation for Finite Mixture Models. Journal of the American Statistical Association,1996. 91(436):p.1716-1723.
    72. Liese, F.V., I, On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory,2006.52(10):p. 4394_4412.
    73. Chawla, D.A.C.a.N.V., Learning Decision Trees for Unbalanced Data. Lecture Notes in Computer Science,2008.5211/2008:p.241-256.
    74.王素芳,基于Hellinger距离的Ad hoc网络合作性研究,in华中科技大学.2010.
    75.李伟湋,贾.,基于Hellinger距离的特征选择算法.计算机应用,2010.6.
    76. Markman, E.M., Categorization and naming in children..1991:MIT Press, Cambridge, MA.
    77.王小明,鞠瑞利,现代认知心理学关于概念表征的研究.上海教育科研,1998.10.
    78. Barsalou, L.W., Cognitive psychology:an overview forcogntive scientists. 1992:Hillsdale, New Jersey.Lawrence Erlbaum Associates, Publishers.
    79. Galotti, H.M., Cognitivepsychology in and out of the laboratory. (1994): Brooks/cole publishing Company.
    80. Stevenson, R.J., Language, thought and representation.1993:Chichester: John Wiley & Sons.
    81. Sterberg, R.J., Smith, E. E, The psychology of human thought.1989: Cambridge University Press.
    82. Rosch, E., The nature of mental codes for color categories. Journal of Experimental Psychology:Human Perception and Performance,1975.1(4):p. 303-322.
    83. Gamham, A., Oakhill, J, Thinking and reasoning and reasoning.1994:Oxford: Blackwell.
    84. Mechelen, I.V.e.a., (ed). ed. Categories and concept:theoretical views and inductive dataanalysis.1993, London:Academic Press. PP.178-180.
    85. Stevenson, R.J., Language,thought and representation.1993:Chichester:John Wiley & Sons. pp.202.
    86. Robert J. Sternberg, E.E.S., The Psychology of Human Thought.1989: Cambridge University Press.
    87. Medin, D.L., Smith, E.E, Concept and concept formation. Annual Review of Psychology,1984.35:p.120-121.
    88. Michie, D., Spiegelhalter, D.J., and Taylor, C.C., eds, Machine learning, neural and statistical classification.1994:Ellis Horwood.
    89. B Berlin, P.K., Basic color terms.1969:University of California Press Berkeley.
    90. Kishore Papineni, S.R., Todd Ward, Wei-Jing Zhu. BLEU:a method for automatic evaluation of machine translation. in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics 2002. Morristown, NJ, USA Association for Computational Linguistics.
    91. Roy, D., Grounding words in perception and action:computational insights. Trends in Cognitive Sciences. August 2005.9(8):p.389-396.
    92. Pastra, K., Viewing Vision-Language Integration as a Double-Grounding Case. American Association for Artificial Intelligence,2004.
    93. Yu, C. and D.H. Ballard, On the integration of grounding language and learning objects, in Proceedings of the Nineteenth National Conference on Artificial Intelligence.2004.
    94. Rony Kubat, D.M.a.D.R., Semantic Context Effects on Color Categorization. Proceedings of the 31st Annual Cognitive Science Society Meeting. Amsterdam, Netherlands,2009.
    95. Zhang Chun Yu, Z.W., Liu Hai Peng, Wang Xiao Jie Visual Information based Meaning Acquisition of Chinese, in 10th Chinese National Conference on Computational Linguistics, CNCCL.2009, Tinghua University:Yan Tai City, Shan Dong Province, China. p. p260-p266.
    96. Gentner, Why nouns are learned before verbs:Linguistic relativity versus natural partitioning. S. Kuczaj, editor, Language development:Vol.2. Language, cognition, and culture.Erlbaum, Hillsdale, NJ,,1982.
    97. Gleitman, L., The structural sources of verb meanings. Language Acquisition, 1990.
    98. Jane Gillette, H.G, Lila Gleitman*, Anne Lederer, Human simulations of vocabulary learning. Cognition 1999.73:p.135-176.
    99. Fillmore, C., J, The hard road from verbs to nouns:In Honor of William S.Y. Wang:Interdisciplinary Studies on Language and Language Change, ed. O.J.L.T. ed. by Matthew Y. Chen, and William S.Y. Wang.1994:Taiwan: Pyramid Press.
    100. Mukerjee, G.S.a.A. Acquiring Linguistic Argument Structure from Multimodal Input using Attentive Focus. in Development and Learning, ICDL 2008.7th IEEE International Conference 2008. Monterey, CA
    101. Johnson, M., The Body in the Mind:The Bodily Basis of Meaning, Imagination, and Reason,.1987:University of Chicago.
    102. Lakoff, G., Women, Fire, and Dangerous Things:What Categories Reveal About the Mind.1987:Chicago:University of Chicago Press.
    103. Rohrer, T.,'Image Schemata in the Brain', in Beate Hampe (ed.) From Perception to Meaning:Image Schemas in Cognitive Linguistics.2006, Berlin: Mouton de Gruyter.
    104. Fillmore, C.J., Scenes-and-frames semantics, in Linguistic Structures Processing, e. In A. Zampolli, Editor.1977:Amsterdam:North-Holland, p. 55-81
    105. Fillmore, C.J., Frame semantics, in Linguistics in the Morning Calm, e. In The Linguistic Society of Korea, Editor.1982:Seoul:Hanshin. p.111-37.
    106. Fillmore, C.C.a.B.T.A. Towards a frame-based lexicon:the semantics of RISK and its neighbors. in Adrienne Lehrer and Eva Kittay, eds, Frames, Fields, and Contrasts, Hillsdale.1992. NJ:Lawrence Erlbaum Assoc.
    107. Ungerer E. and S.H. J, An Introduction to Cognitive Linguistics.2001:北京,外语教学与研究出版社.
    108.魏东波,认知语言学框架中语义研究反思.外语学刊,2006.5.
    109. Fillmore, C., J. An alternative to checklist theories of meaning, in Proceedings of the First Annual Meeting of the Berkeley Linguistics Society.1975: Berkeley:Berkeley Linguistics Society.
    110. Fillmore, C., J, Frames and the semantics of understanding. Quaderni di Semantica,1985.6(2):p.222-254.
    111. Minsky, M., A framework for representing knowledge, in The Psychology of Computer Vision, e. P. H. Winston, Editor.1975, New York:McGrawHill, 211-7.
    112. Kohonen, T. Automatic formation of topological maps of patterns in a self-organizing system. in Proc.2SCIA, Scand. Conf. on Image Analysis.1981. Helsinki, Finland:Suomen Hahmontunnistustutkimuksen Seura r.y.
    113. Kohonen, T., Self-Organizing Maps.1995. (Second Extended Edition 1997): Springer, Berling, Heidelberg.
    114.王义娜,张晓燕,运动事件框架理论的应用与思考.社会科学论坛,2007.8.
    115. Talmy, L., Lexicalization patterns:Semantics structure in lexical forms, in Language typology and syntactic description, e. Timothy Shoper, Editor.1985, Cambridge:Cambridge University Press. p.36-149.
    116. Talmy, L. Path to realization:a typology of event conflation. in Proceedings of the Seventeenth Annual Meeting of the Barkeley Linguistics Society.1991: Berkeley:Berkeley Linguistics Society.
    117. Talmy, L., Toward a Cognitive Semantics. Vol.2.2000:Cambridge, MA; London:MIT Press.
    118. Whorf, B.L., Language, thought and reality.1956:Massachusetts:MIT Press.
    119. Slobin, D.I.N.H. Reference to movement in spoken and signed languages: Typological considerations. in Proceedings of the Twentieth Annual Meeting of the Berkeley Linguistic Society..1994:Berkeley:Berkeley Linguistics Society.
    120. Haiman, J., Dictionaries and encyclopaedias. Lingua,1980.50(377-88).
    121. Caixia Yuan, X.W.a.Y.Z., Some improvements on maximum entropy based Chinese POS tagging. The Journal of China Universities of Posts and Telecommunications,2006.13(3).
    122. Collins, M. A new statistical parser based on bigram lexical dependencies. in In Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics.1996. Santa Cruz, CA.
    123. Kohonen, T., Self-organizing formation of topologically correct feature maps Biol. Cyb,1982.43(1):p.59-69.
    124. Jain, J.M.a.A.K., Artificial Neural Networks for Feature Extraction and Multivariate Data Projection. IEEE TRANSACTIONS ON NEURAL NETWORKS,1995.6(2):p.296-317.
    125. Jorma T. Laaksonen, J.M.K., Erkki Oja, Class distributions on SOM surfaces for feature extraction and object retrieval. Neural Networks,2004.17:p. 1121-1133.
    126. Schmidbauer, O.T., J.; Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA An LVQ based reference model for speaker-adaptive speech recognition, in 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing,1992. ICASSP-92.1992.
    127. Koby Crammer, R.G-b., Amir Navot,Naftali Tishby, Margin Analysis of the LVQ Algorithm, in Advances in Neural Information Processing Systems.2002, MIT press. p.462-469.
    128. Zhan Chuan, L.X., Hou Mengshu, Zhou Xu A LVQ-based neural network anti-spam email approach. ACM SIGOPS Operating Systems,2005.39(1):p. 34-39.
    129. Lisman, J., A Mechanism for the Hebb and the Anti-Hebb Processes Underlying Learning and Memory. Proceedings of the National Academy of Sciences of the United States of America,1989.86(23):p.9574-9578.
    130. Arnaud Szmaleca*, W.D., Andre Vandierendoncka, Ariadna Barbera Matab & Mike P. A. Pagec, The Hebb repetition effect as a laboratory analogue of novel word learning. The Quarterly Journal of Experimental Psychology,2009. 62(3):p.435-443.
    131. Michael Biehl, A.G., Barbara Hammer Dynamics and Generalization Ability of LVQ Algorithms. The Journal of Machine Learning Research,2007.8:p. 323-360.
    132.程建国,神经计算与生长自组织网络.2008:西安交通大学出版社.
    133. Cruse, W.C.a.D.A., Cognitive Linguisitcs.2004:Cambridge University.
    134. Patrick Rousset, C.G Distance between Kohonen Classes Visualization Tool to Use SOM in Data Set Analysis and Representation. in Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Bio-inspired Applications of Connectionism-Part Ⅱ 2001. London, UK: Springer-Verlag.
    135. Juha Vesanto, J.H., Esa Alhoniemi, and Juha Parhankangas. Self-organizing map in Matlab. the SOM Toolbox. in In Proceedings of the Matlab DSP Conference 1999. November 1999.
    136. Corporation, I., Open Source Computer Vision Library Reference Manual,123456-001.2001.
    137. Horn, B.K.P., Robot vision.1986:MIT Press.
    138. Freeman, W.T., Tanaka, K., Ohta, J. and Kyuma, K,, Computer Vision for Computer Games, in Int. Conf. On Automatic Face and Gesture Recognition. 1996. p. pp.100-105.
    139. Bradski, G.R., Computer vision face tracking for use in a perceptual user interface. Intel Technology Journal,1998.2nd Quarter.
    140. Wei Zhang and Xiao Jie Wang, "Language Grounding Model:Connecting Utterances and Visual Attributions", Fourth International Workshop on Advanced Computational Intelligence.October 19-21,2011, Wuhan, China

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700