摘要
现有网络表示学习算法主要针对网络结构信息进行表示学习,而忽略现实网络中丰富的节点文本属性信息。为有效融合网络结构信息和节点文本属性信息进行表示学习,提出一种新的网络表示学习算法。为实现两方面信息在训练过程中的相互约束,建立基于参数共享的共耦神经网络训练模型,并利用负采样和随机梯度下降的优化策略实现训练过程的快速收敛。实验结果表明,与Doc2Vec算法、DeepWalk算法、DW+D2V算法和TADW算法相比,该算法的分类性能更好。
The existing network representation learning algorithms mainly focus on how to represent the network structure information,and ignore the abundant textual attribute information of nodes in real network. In order to incorporate network structure information and nodes' textual attribute information,this paper presents a novel network representation learning algorithm incorporating with nodes' textual attribute information. As to achieve mutual restraint of the two part of network information during the training process,this algorithm constructs a coupled neural network training model based on parameter sharing stratagem. It applies optimization strategy based on negative sample and stochastic gradient descent to achieve rapid convergence of the training process, and performs an experimental evaluation of node classification. Experimental results demonstrate that compared with Doc2 Vec algorithm,DeepWalk algorithm,DW + D2 V algorithm and TADW algorithm,the classification performance of the proposed algorithm is better.
引文
[1]李贞镐,金德鹏.基于移动大数据的城市深夜公交线路改进方案[J].计算机工程,2018,44(4):23-27.
[2]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[3]BELKIN M,NIYOGI P.Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Proceedings of Advances in Neural Information Processing Systems.British Columbia,Canada:MITPress,2002:585-591.
[4]TENENBAUM J B,DE SILVA V,LANGFORD J C.Aglobal geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323.
[5]SHAW B,JEBARA T.Structure preserving embedding[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York,USA:ACM Press,2009:937-944.
[6]涂存超,杨成,刘知远,等.网络表示学习综述[J].中国科学:信息科学,2017,47(8):980-996.
[7]CAI Hongyun,ZHENG V W,CHANG K.A comprehensive survey of graph embedding:problems,techniques and applications[EB/OL].[2018-03-13].https://arxiv.org/abs/1709.07604.
[8]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2014:701-710.
[9]MIKOLOV T,SUTSKEVER I,CHEN Kai,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems.Lake Tahoe,USA:Curran Associates,2013:3111-3119.
[10]TANG Jian,QU Meng,WANG Mingzhe,et al.Line:large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.New York,USA:ACM Press,2015:1067-1077.
[11]GROVER A,LESKOVEC J.Node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2016:855-864.
[12]李志宇,梁循,周小平,等.一种大规模网络中基于节点结构特征映射的链接预测方法[J].计算机学报,2016,39(10):1947-1964.
[13]YANG Cheng,LIU Zhiyuan,ZHAO Deli,et al.Network representation learning with rich text information[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence.Buenos Aires,Argentina:AAAIPress,2015:2111-2117.
[14]SPARCK J K.A statistical interpretation of term specificity and its application in retrieval[J].Journal of Documentation,1972,28(1):11-21.
[15]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//Proceedings of International Conference on Machine Learning.Beijing,China:[s.n.],2014:1188-1196.
[16]MIKOLOV T,SUTSKEVER I,CHEN Kai,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems.Lake Tahoe,USA:Curran Associates,2013:3111-3119.
[17]LIM K W,BUNTINE W.Bibliographic analysis with the citation network topic model[C]//Proceedings of the 6th Asian Conference on Machine Learning.Nha Trang,Vietnam:MLR Press,2014:142-158.