用户名: 密码: 验证码:
知识图谱复杂网络特性的实证研究与分析
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Empirical study of knowledge network based on complex network theory
  • 作者:丁连红 ; 孙斌 ; 时鹏
  • 英文作者:Ding Lian-Hong;Sun Bin;Shi Peng;School of Information, Beijing Wuzi University;National Center for Materials Service Safety, University of Science and Technology Beijing;
  • 关键词:微软概念图 ; 复杂网络 ; 子网抽取 ; 近似平均路径
  • 英文关键词:microsoft concept graph;;complex network;;subnet extraction;;approximate average path
  • 中文刊名:WLXB
  • 英文刊名:Acta Physica Sinica
  • 机构:北京物资学院信息学院;北京科技大学国家材料服役安全科学中心;
  • 出版日期:2019-05-30 14:01
  • 出版单位:物理学报
  • 年:2019
  • 期:v.68
  • 基金:国家重点研发计划(批准号:2017YFB0203703);; 国家自然科学基金重点项目(批准号:71831001);; 北京市教委科技计划一般项目(批准号:KM201910037186)、京市教委社科计划一般项目(批准号:SM201610037001);; 北京物资学院高水平科研团队协同基金(批准号:2017GG05)资助的课题~~
  • 语种:中文;
  • 页:WLXB201912034
  • 页数:15
  • CN:12
  • ISSN:11-1958/O4
  • 分类号:324-338
摘要
知识图谱是人工智能研究的新热点,用于解决智能检索、自动回答等应用问题.概念图谱是微软发布的大型知识图谱,本文以概念图谱为研究对象构建了概念图谱网络,并对其特性进行了分析.为解决概念图谱海量数据带来的计算困难,提出一种适合概念图谱的最大子网构建方法和一种网络近似平均路径的计算方法;对不同度值给出了不同的聚类系数求解方法,并通过Map/Reduce模式进行提速.结果表明:概念图谱网络具有无标度和极端小世界网络的特征;平均路径长度随网络规模增加而减小并趋于4这一定值,网络的"菱形"结构能很好地解释这一现象;概念图谱网络是异构的,相邻节点的度负相关;k-shell分解表明子概念占核心层节点的99.5%,在概念图谱中起重要的连接作用;子概念的缺失对概念图谱的完整性影响最大、概念其次、实例最小;82%的实例节点度为1, 40%的概念节点度为1,实例比概念更不易因一词多义而引起理解歧义.
        Knowledge graph is a hot topic in artificial intelligence area and has been widely adopted in intelligent search and question-and-answer system. Knowledge graph can be regarded as a complex network system and analyzed by complex network theory, which studies the interaction or relationship between various factors and basic characteristics of complex system. Its characteristics and their physical meanings are very helpful in understanding the nature of the knowledge graph. Concept graph is a large-scaled knowledge graph published by Microsoft. In this paper, we construct a huge complex network according to Microsoft's concept graph. Its complex network characteristics, such as degree distribution, average shortest distance, clustering coefficient and degree correlation, are calculated and analyzed. The concept graph is not a connected network and its scale is very large; an approach is proposed to extract its largest connected subnet. The method has obvious advantages in both time complexity and space complexity. In this paper, we also present a method of calculating the approximate average shortest path of the largest connected subnet. The method estimates the maximum and minimum value of the shortest distance between nodes according to the distance between the central node and the network layer that the node belongs to and the distance between different layers. In order to calculate the clustering coefficient, different methods are introduced for nodes with different degree values and Map/Reduce idea is adopted to reduce the time cost. The experimental results show that the largest subnet of the concept graph is an ultra-small world network with the characteristics of scale-free. The average shortest path length decreases towards 4 with the network size increasing, which can be easily explained by the diamond-shaped network structure. The concept graph is a disassortative network where low degree nodes tend to connect to high degree nodes. The subConcepts account for 99.5% of nodes in the innermost k-core after kshell decomposition. It shows that the subConcepts play an important role in the connectivity of network. The absence of subConcept affects the complexness of concept graph most, the concept next, and the instance least.The 82% instance nodes and 40% concept nodes of the concept graph each have a degree value of 1. It is believed that compared with the concept words, the instance words do not lead to the ambiguity in the understanding of natural language, caused by polysemy.
引文
[1] Wang Z Y, Wang H X, Wen J R, Xiao Y H 2015 ACM International Conference on Information and Knowledge Management Melbourne, Australia, October 18-23, 2015 p653
    [2] Hagberg A A, Schult D A, Swart P J 2008Proceedings of the7th Python in Science Conference Pasadena, CA USA,August 19-24, 2008 pll
    [3] Watts D J, Strogatz S H 1998 Nature 393 440
    [4] Barabasi A L, Albert R 1999 Science 286 509
    [5] Liu Z H, Zeng Y, Wu H L, Ma J F 2014 Journal of Computer Research and Development 51 2788(in Chinese)[刘志宏,曾勇,吴宏亮,马建峰2014计算机研究与发展51 2788]
    [6] An H Z, Zhong W Q, Chen Y R, Li H J, Gao X Y 2014Energy 74 254
    [7] Almog A, Squartini T, Garlaschelli D 2015 New J Phys. 17013009
    [8] Xing X, Yu D X, Tian X J, Wang S G 2017 Acta Phys. Sin.66 230501(in Chinese)[邢雪,于德新,田秀娟,王世广2017物理学报66 230501]
    [9] Colombo A, Campos G R D, Rossa F D 2017 IEEE Trans.Autom. Control 62 4933
    [10] Wan X, Li Q M, Yuan J F, Schonfeld P M 2015 Accid. Anal.Prev. 82 90
    [11] Ye K H, Yuan X 2018 Resources Development&Market 3459(in Chinese)[叶堃晖,袁欣2018资源开发与市场34 59]
    [12] Hu Y H, Zhu D L 2009 Physica A 388 2061
    [13] Pien K C, Han K, Shang W L, Majumdar A 2015Transportmetrica A 11 772
    [14] Albert R, Jeong H, Barabasi A L 2000 Nature 406 378
    [15] Broder A, Kumar R, Maghoul F, Raghavan P, RajagopalanS, Stata R, Tomkins A, Wiener J 2000 Comput. Netw. 33 309
    [16] Albert R, Jeong H, Barabasi A L 1999 Nature 401 130
    [17] Aiello W, Chung F, Lu L Y 2001 42nd Annual Symposium on Foundations of Computer Science Las Vegas, NV, USA October 14-17, 2001 p510
    [18] Aiello W, Chung F, Lu L Y 2000 Proceedings of the Thirtysecond Annual ACM Symposium on Theory of Computing Portland, Oregon, USA, May 21-23, 2000 p171
    [19] Rattigan M, Maier M, Jensen D 2006 Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Philadelphia, PA, USA, August20-23, 2006 p357
    [20] Fiedor P 2015 Acta Phys.Pol. A 127 863
    [21] Wang G J, Xie C, Stanley H E 2018 Comput. Econ. 51 607
    [22] Qiu L, Jia T M, Yang H J 2016 Acta Phys. Sin. 65 198901(in Chinese)[邱路,贾天明,杨会杰2016物理学报65 198901]
    [23] Sun Y F, Wang C Y 2018 Acta Phys. Sin. 67 148901(in Chinese)[孙延风,王朝勇2018物理学报67 148901]
    [24] Ruan Y R, Lao S Y, Wang J D, Bai L, Chen L D 2017 Acta Phys.Sin.66 038902(in Chinese)[阮逸润,老松杨,王俊德,白亮,陈立栋2017物理学报66 038902]
    [25] Kitsak M, Gallos L K, Havlin S, Liljeros F, Muchnik L,Stanley H E, Makse H A 2010 Nat. Phys. 6 888
    [26] Li Q, Zhou T, Lu L Y, Chen D B 2014 Physica A 404 47
    [27] Ruan Y R, Lao S Y, Xiao Y D, Wang J D, Bai L 2016 Chin.Phys. Lett. 33 028901
    [28] Kong J T, Huang J, Gong J X, Li E Y 2018 Acta Phys. Sin.67 098901(in Chinese)[孔江涛,黄健,龚建兴,李尔玉2018物理学报67 098901]
    [29] Han D D, Yao Q Q, Chen Q, Qian J H 2017 Acta Phys. Sin.66 248901(in Chinese)[韩定定,姚青青,陈趣,钱江海2017物理学报66 248901]
    [30] Niu R W, Pan G J 2016 Chin. Phys. Lett. 33 068901
    [31] Jiang J, Zhang R, Guo L, Li W, Cai X 2016 Chin. Phys. Lett.33 108901
    [32] Tang J Y, Wang T, Wang W 2011 Journal of Software 222279(in Chinese)[唐晋韬,王挺,王戟2011软件学报222279]
    [33] NetworkX Developers https://networkx.github.io/documentation/networkxl.9/modules/networkx/algorithms/components/connected.html#connected_component_subgraphs[2014-6-21]
    [34] Newman M E J 2003 Phys. Rev. E 67 026126
    [35] Holme P, Kim B J, Yoon C N, Yoon C N, Han S K 2002Phys. Rev. E 65 056109

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700