用户名: 密码: 验证码:
面向移动社会网络的用户年龄与性别特征识别
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Mobile social network oriented user feature recognition of age and sex
  • 作者:李源昊 ; 陆平 ; 吴一凡 ; 韦薇 ; 宋国杰
  • 英文作者:LI Yuanhao;LU Ping;WU Yifan;WEI Wei;SONG Guojie;School of Electronics Engineering and Computer Science, Peking University;Zhongxing Telecommunication Equipment Corporation;
  • 关键词:移动社会网络 ; 社会网络分析 ; 特征识别 ; 关系马尔可夫网络
  • 英文关键词:mobile social network;;social network analysis;;feature recognition;;relational Markov network
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:北京大学信息科学技术学院;中兴通讯股份有限公司;
  • 出版日期:2016-02-10
  • 出版单位:计算机应用
  • 年:2016
  • 期:v.36;No.306
  • 基金:国家863计划项目(2014AA015103);; 国家科技支撑计划项目(2014BAG01B02);; 北京市自然科学基金资助项目(4152023);; 中兴通讯研究基金资助项目~~
  • 语种:中文;
  • 页:JSJY201602016
  • 页数:8
  • CN:02
  • ISSN:51-1307/TP
  • 分类号:80-87
摘要
移动社会网络数据存在网络结构复杂,节点间标签相互影响,包含交互信息、位置信息等多种复杂信息等特点,给识别用户的特征带来了许多挑战。针对这些挑战,通过分析一个真实的移动网络数据,利用统计学分析提取出已标记的不同特征用户间的差异,并利用这些差异,借助关系马尔可夫网络建立预测模型对未标记用户的年龄与性别进行特征识别。分析表明,不同年龄、性别的用户在不同时段的通话概率、通话熵,位置信息的分布、离散性,在社会网络中的集聚程度,以及相互之间二元、三元的交互频率方面都存在明显的差异。利用这些特征,提出了利用二元和三元交互的关系基团模板,结合用户自身的时间空间特征,通过关系马尔可夫网络计算用户特征的全联合分布概率,进而以此推断用户的年龄与性别的方法。经过实验分析,利用关系马尔可夫网络、用户时空信息和用户交互的关系基团的分类方法相较于传统的C4.5决策树、随机森林、Logistic回归和Naive Bayes等分类方法,能够提高最高约8%的预测准确率。
        Mobile social network data has complex network structure, mutual label influence between nodes, variety of information including interactive information, location information, and other complex information. As a result, it brings many challenges to identify the characteristics of the user. In response to these challenges, a real mobile network was studied, the differences between the tagged users with different characteristics were extracted using statistical analysis, then the user's features of age and sex were recognized using relational Markov network prediction model. Analysis shows that the user of different age and sex has significant difference in call probability at different times, call entropy, distribution and discreteness of location information, gather degree in social networks, as well as binary and ternary interaction frequency. With these features, an approach for inferring the user's age and gender was put forward, which used the binary and ternary interaction relation group template, combined with the user's own temporal and spatial characteristics, and calculated the total joint probability distribution by relational Markov network. The experimental results show that the prediction accuracy of the proposed recognition model is at least 8% higher compared to the traditional classification methods, such as C4. 5 decision tree, random forest, Logistic regression and Naive Bayes.
引文
[1]AGGARWAL C C.An introduction to social network data analytics[M]//Social Network Data Analytics.Berlin:Springer-Verlag,2011:1-15.
    [2]刘军.社会网络分析导论[M].北京:社会科学文献出版社,2004:1-106.(LIU J.An introduction to social network analysis[M].Beijing:Social Sciences Academic Press(China),2004:1-106.)
    [3]工业和信息化部.2014年通信运营业统计公报[EB/OL].[2015-03-17].http://www.miit.gov.cn/n11293472/n11293832/n11294132/n12858447/16414615.html.(Ministry of Industry and Information Technology of the People's Republic of China.Statistical bulletin of Communication industry of 2014[EB/OL].[2015-03-17].http://www.miit.gov.cn/n11293472/n11293832/n11294132/n12858447/16414615.html.)
    [4]ECKERT P.Gender and sociolinguistic variation[M]//Language and Gender.Cambridge,Eng.:Cambridge University Press,1998:64-75.
    [5]HOLMES J.Women's talk:the question of sociolinguistic universals[J].Australian Journal of Communications,1998,20(3):125-149.
    [6]HERRING S C.Two variants of an electronic message schema[M]//Computer-mediated Communication:Linguistic,Social and Cross-cultural Perspectives.Amsterdam:John Benjamins Publishing Company,1996:81-108.
    [7]KOPPEL M,ARGAMON S,SHIMONI A R.Automatically categorizing written texts by author gender[J].Literary and Linguistic Computing,2002,17(4):401-412.
    [8]SCHLER J,KOPPEL M,ARGAMON S,et al.Effects of age and gender on blogging[C]//Proceedings of the 2006 AAAI Spring Symposium:Computational Approaches to Analyzing Weblogs.Menlo Park,CA:AAAI Press,2006:199-205.
    [9]WEBER I,CASTILLO C.The demographics of Web search[C]//SIGIR'10:Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2010:523-530.
    [10]LORIGO L,PAN B,HEMBROOKE H,et al.The influence of task and gender on search and evaluation behavior using Google[J].Information Processing&Management,2006,42(4):1123-1131.
    [11]KHORITONOV E,SERDYUKOV P.Gender-aware re-ranking[C]//SIGIR'12:Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2012:1081-1082.
    [12]HU J,ZENG H,LI H,et al.Demographic prediction based on user's browsing behavior[C]//WWW'07:Proceedings of the16th International Conference on World Wide Web.New York:ACM,2007:151-160.
    [13]BI B,SHOKOUHI M,KOSINSKI M,et al.Inferring the demographics of search users:Social data meets search queries[C]//WWW'13:Proceedings of the 22nd International Conference on World Wide Web.New York:ACM,2013:131-140.
    [14]MO K,TAN B,ZHONG E,et al.Report of task 3:your phone understands you[C/OL]//Nokia Mobile Data Challenge 2012Workshop.Newcastle,UK:[s.n.],2012[2015-06-03].http://home.cse.ust.hk/~kxmo/materials/mdc-final131-mo.pdf.
    [15]YING J J-C,CHANG Y-J,HUANG C-M,et al.Demographic prediction based on user's mobile behavior[C]//Nokia Mobile Data Challenge 2012 Workshop.Newcastle,UK:[s.n.],2012[2015-07-03].http://idb.csie.ncku.edu.tw/paper/conference/Demographic%20Prediction%20Based%20on%20User's%20Mobile%20Behaviors.pdf.
    [16]DONG Y,YANG Y,TANG J et al.Inferring user demographics and social strategies in mobile social networks[C]//KDD'14:Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2014:15-24.
    [17]SALTON C,BUCKLEY C.Term-weighting approaches in automatic text retrieval[J].Information Processing&Management,1988,24(5):513-523.
    [18]HOLLAND P W,LEINHARDT S.Transitivity in structural models of small groups[J].Comparative Group Studies,1971,2(2):107-124.
    [19]TASKAR B,ABBEEL P,KOLLER D.Discriminative probabilistic models for relational data[C]//UAI'02:Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence.San Francisco,CA:Morgan Kaufmann,2002:485-492.
    [20]LIAO L,FOX D,KAUTZ H.Location-based activity recognition using relational Markov networks[C]//IJCAI'05:Proceedings of the International Joint Conference on Artificial Intelligence.San Francisco,CA:Morgan Kaufmann,2005:773-778.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700