用户名: 密码: 验证码:
基于时间序列聚类的主题发现与演化分析研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on Topic Discovery and Evolution Based on Time Series Clustering
  • 作者:李海林 ; 邬先利
  • 英文作者:Li Hailin;Wu Xianli;College of Business Administration, Huaqiao University;
  • 关键词:AP聚类 ; 时间序列聚类 ; 主题发现 ; 主题演化
  • 英文关键词:AP clustering;;time series clustering;;themes discovery;;topic evolution
  • 中文刊名:情报学报
  • 英文刊名:Journal of the China Society for Scientific and Technical Information
  • 机构:华侨大学工商管理学院;
  • 出版日期:2019-10-24
  • 出版单位:情报学报
  • 年:2019
  • 期:10
  • 基金:国家自然科学基金项目“高维时间序列数据聚类分析及应用研究”(71771094);; 福建省社会科学规划项目“基于时间序列数据挖掘的期刊参考文献和引证文献分析研究”(FJ2017B065)
  • 语种:中文;
  • 页:49-58
  • 页数:10
  • CN:11-2257/G3
  • ISSN:1000-0135
  • 分类号:TP391.1;O211.61
摘要
针对现有研究对文献主题发现和演化分析方法的单一性,本文提出了基于时间序列聚类的主题发现与演化分析方法。该方法首先通过共词分析找出文献数据集中高频关键词的共现矩阵,利用Ochiia系数计算方法将共现矩阵转换为相似性矩阵,然后使用近邻传播聚类算法发现文献主题。同时,再将主题在某段时间内的研究热度进行分析并转化为反映主题热度时间序列数据,结合时间序列聚类方法对各主题进行分类以及演化趋势的分析。实验结果表明,通过对中国知网中2000—2018年与创新管理相关的期刊文献进行数据处理与挖掘,提出的方法能有效地发现期刊的研究主题,并且能较好地分析这些主题的演化趋势。
        In view of the uniqueness of the existing methods of topic discovery and evolutionary analysis in literature, this paper proposes a method of topic discovery and evolutionary analysis based on time series clustering. The co-occurrence matrix of high-frequency keywords in document datasets is found by co-word analysis. The co-occurrence matrix is transformed into a similarity matrix by the Ochiia coefficient calculation method, and then the topic of the document is found by using the nearest neighbor propagation clustering algorithm. At the same time, the research heat of each topic during a certain period is analyzed and transformed into time series data reflecting the heat of each topic, and the time series clustering method is used to classify and analyze the evolution trend of each topic. The experimental results show that the proposed method can effectively discover the research topics of journals and better analyze the evolution trends of these topics through data processing and mining of the journal literature related to innovation management in CNKI from 2000 to 2018.
引文
[1]王平.基于层次概率主题模型的科技文献主题发现及演化[J].图书情报工作,2014,58(22):70-77.
    [2]de la Hoz-Correa A,Mu?oz-Leiva F,Bakucz M.Past themes and future trends in medical tourism research:A co-word analysis[J].Tourism Management,2018,65:200-211.
    [3]Mryglod O,Holovatch Y,Kenna R,et al.Quantifying the evolu-tion of a scientific topic:Reaction of the academic community to the Chornobyl disaster[J].Scientometrics,2016,106(3):1151-1166.
    [4]de la Hoz-Correa A,Mu?oz-Leiva F,Bakucz M.Past themes and future trends in medical tourism research:A co-word analysis[J].Tourism Management,2018,65:200-211.
    [5]郭红梅,孔贝贝,张智雄.基于多重文本关系图中clique子团聚类的主题识别方法研究[J].情报学报,2017,36(5):433-442.
    [6]Hajjem M,Latiri C.Combining IR and LDA topic modeling for filtering Microblogs[J].Procedia Computer Science,2017,112:761-770.
    [7]刘自强,王效岳,白如江.多维主题演化分析模型构建与实证研究[J].情报理论与实践,2017,40(3):92-98.
    [8]Bry X,Redont P,Verron T,et al.THEME-SEER:A multidimen-sional exploratory technique to analyze a structural model using an extended covariance criterion[J].Journal of Chemometrics,2012,26(5):158-169.
    [9]王小华,徐宁,谌志群.基于共词分析的文本主题词聚类与主题发现[J].情报科学,2011,29(11):1621-1624.
    [10]Pavlinek M,Podgorelec V.Text classification method based on self-training and LDA topic models[J].Expert Systems with Ap-plications,2017,80:83-93.
    [11]廖海涵,王曰芬,关鹏.微博舆情传播周期中不同传播者的主题挖掘与观点识别[J].图书情报工作,2018,62(19):77-85.
    [12]Suh S,Choo J,Lee J,et al.L-Ens NMF:Boosted local topic dis-covery via ensemble of nonnegative matrix factorization[C]//Pro-ceedings of the International Conference on Data Mining.New York:IEEE,2016:479-488.
    [13]Yang Z,Michailidis G.A non-negative matrix factorization meth-od for detecting modules in heterogeneous omics multi-modal da-ta[J].Bioinformatics,2016,32(1):1-8.
    [14]Zong L L,Zhang X C,Zhao L,et al.Multi-view clustering via multi-manifold regularized non-negative matrix factorization[J].Neural Networks,2017,88:74-89.
    [15]Abidin T F,Yusuf B,Umran M.Singular Value Decomposition for dimensionality reduction in unsupervised text learning prob-lems[C]//Proceedings of the International Conference on Educa-tion Technology and Computer.New York:IEEE,2010:V4-422-V4-426.
    [16]Xue S F,Jiang H,Dai L R,et al.Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition[J].Journal of Signal Processing Systems,2016,82(2):175-185.
    [17]Gerk?i?S,Pregelj B,Perne M,et al.Model predictive control of ITER plasma current and shape using singular-value decomposi-tion[J].Fusion Engineering and Design,2018,129:158-163.
    [18]李海林,万校基,林春培.基于关键词重要性和近邻传播聚类的主题分析研究[J].情报学报,2018,33(5):533-542.
    [19]王沙沙,丰景春,薛松,等.基于知识图谱的PPP研究热点主题分析[J].科技管理研究,2017,37(17):167-173.
    [20]Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
    [21]朱红,丁世飞,许新征.基于改进属性约简的细粒度并行AP聚类算法[J].计算机研究与发展,2012,49(12):2638-2644.
    [22]Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
    [23]刘晓勇,付辉.一种快速AP聚类算法[J].山东大学学报(工学版),2011,41(4):20-23.
    [24]李海林,梁叶.基于数值符号和形态特征的时间序列相似性度量方法[J].控制与决策,2017,32(3):451-458.
    [25]Kajita S,Itakura F.Subband-Autocorrelation analysis and its ap-plication for speech recognition[C]//Proceedings of the Interna-tional Conference on Acoustics,Speech,and Signal Processing.New York:IEEE,1994,2:II/193-II/196.
    [26]李海林,梁叶.基于动态时间弯曲的股票时间序列联动性研究[J].数据采集与处理,2016,31(1):117-129.
    [27]Diaz M,Henriquez P,Ferrer M A,et al.Stability-based system for bearing fault early detection[J].Expert Systems with Applica-tions,2017,79:65-75.
    [28]Suryanto C H,Xue J H,Fukui K.Randomized time warping for motion recognition[J].Image and Vision Computing,2016,54:1-11.
    [29]Khalid M I,Alotaiby T N,Aldosari S A,et al.Epileptic MEGspikes detection using amplitude thresholding and dynamic time warping[J].IEEE Access,2017,5:11658-11667.
    [30]Thakur M R,Khilnani D R,Gupta K,et al.Detection and preven-tion of botnets and malware in an enterprise network[J].Interna-tional Journal of Wireless and Mobile Computing,2012,5(2):144-153.
    [31]Dürrenmatt D J,del Giudice D,Rieckermann J.Dynamic time warping improves sewer flow monitoring[J].Water Research,2013,47(11):3803-3816.
    [32]李海林,梁叶,王少春.时间序列数据挖掘中的动态时间弯曲研究综述[J].控制与决策,2018,33(8):1345-1353.
    [33]穆颖丽.论高校图书馆知识管理及其实施策略[J].图书情报知识,2003(6):22-24.
    [34]张治河,丁华,孙丽杰,等.创新型城市与产业创新系统[J].科学学与科学技术管理,2006,27(12):150-155.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700