摘要
词语作为语言模型中的基本语义单元,在整个语义空间中与其上下文词语具有很强的关联性。同样,在语言模型中,通过上下文词可判断出当前词的含义。词表示学习是通过一类浅层的神经网络模型将词语和上下文词之间的关联关系映射到低维度的向量空间中。然而,现有的词表示学习方法往往仅考虑了词语与上下文词之间的结构关联,词语本身所蕴含的内在语义信息却被忽略。因此,该文提出了DEWE词表示学习算法,该算法可在词表示学习的过程中不仅考量词语与上下文之间的结构关联,同时也将词语本身的语义信息融入词表示学习模型,使得训练得到的词表示既有结构共性也有语义共性。实验结果表明,DEWE算法是一种切实可行的词表示学习方法,相较于该文使用的对比算法,DEWE在6类相似度评测数据集上具有优异的词表示学习性能。
Words,as the basic semantic unit in language models,are strongly related to the context words in the whole semantic space.Word representation learning aims at mapping the relationship between words and context words into a low dimensional vector space using the shallow neural network models.However,the existing word representation learning methods usually only consider the syntagmatic relations between words,without directly capturing the paradigmatic information.In this paper,a new word representation learning algorithm,DEWE,is proposed to integrate the semantic information of the word itself into the training of word representation.The structural and semantic generalization of the proposed word representation learning method is validated by 6 similarity evaluation datasets,with all results confirming the excellent performance of DEWE.
引文
[1]Mikolov T,Sutskever I,Chen K,et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of Advances in Neural Information Processing Systems 26,arXiv:1310.4546.
[2]Mikolov T,Chen K,Chen,Corrado G S,et al.Efficient Estimation of Word Representations in Vector Space[C]//Proceedings of the 2013International Conference on Learning Representations,arXiv:1301.3781.
[3]Bengio Y,Ducharme R,Vincent P,et al..A Neural Probabilistic Language Model[J].Journal of Machine Learning Research,2000,3(6):932-938.
[4]Uchida J,Nara R,Miyaoka Y,et al.A Fast Elliptic Curve Cryptosystem LSI Embedding Word-based Montgomery Multiplier[J].Ieice Transactions on E-lectronics,2006,E89-C(3):5-10.
[5]Levy O,Goldberg Y,Dagan I.Improving Distributional Similarity with Lessons Learned from Word Embeddings[J].Bulletin De La SociétéBotanique De France,2015,75(3):552-555.
[6]Hamilton W L,Clark K,Leskovec J,et al.Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:595-605.
[7]Hamilton W L,Leskovec J,Dan J.Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:1489-1501.
[8]Levy O,Goldberg Y.Neural Word Embedding as Implicit Matrix Factorization[J].Advances in Neural Information Processing Systems,2014,3:2177-2185.
[9]Liu Y,Liu Z y,Chua T S,et al.Topical word embeddings[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence,2015:2418-2424.
[10]Levy O,Goldberg Y.Dependency-Based Word Embeddings[C]//Proceedings of the Meeting of the Association for Computational Linguistics.2014:302-308.
[11]Huang E H,Socher R,Manning C D,et al.Improving Word Representations via Global Context and Multiple Word Prototypes[C]//Proceedings of the Meeting of the Association for Computational Linguistics:Long Papers.Association for Computational Linguistics,2012:873-882.
[12]Mnih,A.,Teh,Y.W.A Fast and Simple Algorithm for Training Neural Probabilistic Language Models[C]//Proceedings of the 29th International Conference on Machine Learning(ICML’12),1751-1758.
[13]Church K W,Hanks P.Word Association Norms,Mutual Information,and Lexicography[J].Computational Linguistics,1990,16(1):22-29.
[14]Dagan I,Pereira F,Lee L.Similarity-based Estimation of Word Cooccurrence Probabilities[C]//Proceedings of the Meeting on Association for Computational Linguistics.Association for Computational Linguistics,1994:272-278.
[15]Turney,Peter D,Pantel P.From Frequency to Meaning:Vector Space Models of Semantics[J].Journal of Artificial Intelligence Research,2010,37(1):141-188.
[16]Natarajan N,Dhillon I S.Inductive Matrix Completion for Predicting Gene-Disease Associations[J].Bioinformatics,2014,30(12):i60-i68.
[17]Bruni E,Boleda G,Baroni M,et al.Distributional Semantics in Technicolor[C]//Proceedings of the Meeting on Association for Computational Linguistics.2012:136-145.
[18]Luong M T,Socher R,Manning C D.Better Word Representations with Recursive Neural Networks for Morphology[C]//Proceedings of the Seventeenth Conference on Computational Natural Language Learning,2013:104-113.
[19]Radinsky K,Agichtein E,Gabrilovich E,et al.AWord at A Time:Computing Word Relatedness using Temporal Semantic Analysis[C]//Proceedings of International Conference on World Wide Web.ACM,2011:337-346.
[20]Harris Z.Distributional structure[J].Word,1954,10(23):146-162.
[21]Finkelstein L,Gabrilovich E,Matias Y,et al.Placing Search in Context:The Concept Revisited[C]//Proceedings of ACM Transactions on Information Systems,2002,20(1):116-131.
[22]Pennington J,Socher R,Manning C.Glove:Global Vectors for Word Representation[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.2014:1532-1543.
(1)https://en.wiktionary.org/wiki/Wiktionary:Main_Page
(2)http://www.dictionary.com/
(3)https://blog.csdn.net/shijiebei2009/article/details/39696523