基于熵加权属性子空间的目标社区发现

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

基于熵加权属性子空间的目标社区发现

详细信息查看全文 | 推荐本文 |

英文篇名：Target Community Detection Based on Attribute Subspace with Entropy Weight
作者：刘海姣 ; 马慧芳 ; 昌阳 ; 李志欣
英文作者：LIU Haijiao;MA Huifang;CHANG Yang;LI Zhixin;College of Computer Science and Engineering,Northwest Normal University;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology;Guangxi Key Lab of Multi-source Information Mining and Security,Guangxi Normal University;
关键词：熵 ; 属性权重 ; 社区发现 ; 用户偏好
英文关键词：entropy;;attribute weight;;community detection;;user preferences
中文刊名：中文信息学报
英文刊名：Journal of Chinese Information Processing
机构：西北师范大学计算机科学与工程学院;桂林电子科技大学广西可信软件重点实验室;广西师范大学广西多源信息挖掘与安全重点实验室;
出版日期：2019-08-15
出版单位：中文信息学报
年：2019
期：08
基金：国家自然科学基金(61762078,61363058,61663004);; 广西可信软件重点实验室研究课题(kx201705);; 广西省多源挖掘与安全重点实验室开放基金(MIMS18-08)
语种：中文;
页：116-125
页数：10
CN：11-2325/N
ISSN：1003-0077
分类号：TP301.6

摘要

该文提出一种基于熵加权属性子空间的目标社区发现方法,挖掘与用户偏好相关的社区。首先,从属性和结构两个方面综合考虑节点间的相似度,利用用户给定的样例节点及其邻居扩展得到目标社区中心点集;其次,在中心点集上,设计一种熵加权的属性权重计算方法,得到目标社区的属性子空间权重;再次,利用目标社区的属性子空间权重,基于节点的属性和结构相似度重写网络中边的权重;最后,定义社区适度函数并结合重写后网络中边的权重改进社区适度函数,以中心节点集为核心,挖掘基于用户偏好的内部连接紧密且与外部分离较好目标社区。此外,该方法可以扩展到网络中多个社区发现及离群点检测任务中。在人工网络和真实网络数据集上的实验结果验证了该文所提算法的效率和有效性。
In this paper,we propose a method of target community detection based on attribute subspace with entropy weight,so as to detect community related to user preferences.Firstly,the similarity between nodes is calculated from both attributes and structures,and the center node set of the target community can be obtained via extending the sample node with its neighbors given by the user.Secondly,an attribute calculation method with entropy weights is established based on the center node set,and the attribute subspace of the target community can thus be captured.Thirdly,the edge weight of network is re-written based on the similarity between nodes under the captured attribute subspace weights.Finally,the community function is defined and further improved based on the weights of the current network.And then the target community with users' preference is detected based on the center node set,which is closely connected internally and separated from the outside communities.In addition,our method can be extended to multiple community detection tasks and outlier detection.Experimental results on artificial networks and real network datasets demonstrated the efficiency and effectiveness of the proposed algorithm.

引文

[1]贺超波，汤庸，刘海等．一种集成链接和属性信息的社区挖掘方法[J]．计算机学报，2017,40(3):601-616．
    [2]Cheng K,Li J,Liu H.Unsupervised Feature Selection in Signed Social Networks[C]//Proceedings of ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.ACM Press,2017,777-786．
    [3]Perozzi B,Akoglu L.Focused clustering and outlier detection in large attributed graphs[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM Press,2014:1346-1355．
    [4]方倩，窦永香，王帮金．基于Web of Science的社会化媒体环境下社区发现研究综述[J]．中文信息学报，2017,31(3):1-8．
    [5]Chen F,Zhou B,Alim A,et al.A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks[J].Proceedings of International Conference on Data Mining.2017,41-50．
    [6]Gunnemann S,Farber I,Raubach S,et al.Spectral Subspace Clustering for Graphs with Feature Vectors[C]//Proceedings of International Conference on Data Mining.2014,231-240．
    [7]Li X,Wu Y,Ester M,et al.Semi-supervised Clustering in Attributed Heterogeneous Information Networks[C]//Proceedings of International Conference on World Wide Web.2017,1621-1629．
    [8]Yang L,Cao X,He D,et al.Modularity based community detection with deep learning[C]//Proceedings of International Joint Conference on Artificial Intelligence.AAAI Press,2016,2252-2258．
    [9]Wu P,Pan L.Mining target attribute subspace and set of target communities in large attributed networks[J/OL].http：//arXiv.orglabs/1705.03590v1．
    [10]王亚珅，黄河燕，冯冲．基于最小割图分割的社区发现算法[J]．中文信息学报，2017,31(3):213-222．
    [11]Chen J J,Chen J M,Liu J,et al.PSCAN:A Parallel Structural Clustering Algorithm for networks[C]//Proceedings of International Conference on Machine Learning and Cybernetics.IEEE Press,2014:839-844．
    [12]Jing L P,Ng M K,Huang J Z.An Entropy Weighting k-Means Algorithm for Subspace Clustering of HighDimensional Sparse Data[J].IEEE Transactions on Knowledge&Data Engineering,2007,19(8):1026-1041．
    [13]Li P,Wang H,Zhu K Q,et al.A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity[J].IEEE Transactions on knowledge and data engineering,2015,27(10):2604-2617．
    [14]Condon A,Karp R M.Algorithms for Graph Partitioning on the Planted Partition Model[J].Random Structures&Algorithms,2001,18(2):221-232．
    [15]Manning C D,Raghavan P,Schütze H.Introduction to information retrieval[J].Journal of the American Society for Information Science&Technology,2008,43(3):824-825．
    [16]Newman M E.Modularity and community structure in networks[C]//Proceedings of APS March Meeting.American Physical Society,2006,8577-8582．
    [17]马慧芳，陈海波，赵卫中等．融合标签平均划分距离和结构关系的微博用户可重叠社区发现[J]．电子学报，2018,41(9):1025-1036．
    [18]Girvan,M,Newman M E.Community structure in social and biological networks[J].National Academy of Sciences,2002,99(12):7821-7826.
    (1)http：//www-personal.umich.edu/~mejn/netdata/
    (2)http：//bailando.sims.berkeley.edu/enron/
    (3)https：//dblp.uni-trier.de/xml/
    (4)http：//www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700