用户名: 密码: 验证码:
并行动态位向量频繁闭合序列模式挖掘算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:A parallel dynamic bit vector based frequent closed sequence pattern mining algorithm
  • 作者:陈倩 ; 刘云 ; 高钰莹
  • 英文作者:CHEN Qian;LIU Yun;GAO Yu-ying;Faculty of Information Engineering and Automation,Kunming University of Science and Technology;
  • 关键词:数据挖掘 ; 闭合序列模式 ; 动态位向量 ; 多核处理器 ; PDBV-FCSP算法
  • 英文关键词:data mining;;closed-sequence mode;;dynamic bit vector;;multi-core processor;;PDBV-FCSP algorithm
  • 中文刊名:JSJK
  • 英文刊名:Computer Engineering & Science
  • 机构:昆明理工大学信息工程与自动化学院;
  • 出版日期:2018-10-15
  • 出版单位:计算机工程与科学
  • 年:2018
  • 期:v.40;No.286
  • 基金:国家自然科学基金(61262040)
  • 语种:中文;
  • 页:JSJK201810001
  • 页数:9
  • CN:10
  • ISSN:43-1258/TP
  • 分类号:5-13
摘要
针对在时间和空间上都具有高计算成本的长序列数据库,一个更有效和更紧凑且可以完全提取信息的挖掘模式是当前的研究热点。提出一种并行动态位向量频繁闭合序列模式的挖掘算法(PDBVFCSP),该算法采用多核处理器架构和DBV数据结构相结合的方式,有效加快了序列数据库的处理速度,并对搜索空间进行划分,尽早执行预处理序列的闭合检查,减少了所需的存储空间和挖掘频繁闭合序列模式的执行时间,克服了现有并行挖掘算法通信开销、同步和数据复制等问题。利用重新分配工作的动态负载平衡机制,解决处理器之间的负载均衡问题,最大限度地减少了CPU空闲时间。对DBV-VDF算法和PDBV-FCSP(2-4核)算法进行仿真比较,结果表明,PDBV-FCSP算法在运行时间、内存使用和可伸缩性等方面都有较优的性能提升,且当内核数增加时,性能更优。
        For long sequence databases,which have high computational costs both in time and space,a mining model that is more efficient and compact and can extract information completely is a current research hotspot.We propose a parallel dynamic bit vector based frequent closed sequence pattern mining algorithm(PDBV-FCSP),which combines the multi-core processor architecture with the DBV data structure to effectively speed up the processing speed of the sequence database.The search space is divided,and the closed check of the pre-processing sequence is executed as early as possible,which reduces the required storage space and the execution time of mining the frequent closed sequence mode,and overcomes the problems of communication overhead,synchronization and data replication of the existing parallel mining algorithms.The dynamic load balancing mechanism for job redistribution is used to solve the load balancing problem of workloads among processors,thus minimizing the idle CPU time.Simulation results show that,compared with the DBV-VDF algorithm,the PDBV-FCSP algorithm has better performance in terms of running time,memory usage and scalability.And when the core number increases,the performance is better.
引文
[1] Shi Jie.A fast frequent pattern mining algorithm[J].Journal of Yantai University(Natural Science and Engineering Edition),2015,28(2):113-118.(in Chinese)
    [2] Liu Wei-ming,Kuai Hai-long,Chen Zhi-gang,et al.Maximum frequent item mining algorithm of uncertain data based on ordered tree[J].Computer Engineering and Applications,2015,51(24):145-149.(in Chinese)
    [3] Xu S,Su S,Cheng X,et al.Differentially private frequent sequence mining via sampling-based candidate pruning[C]∥Proc of IEEE International Conference on Data Engineering,2015:1035-1046.
    [4] Rashid M M,Gondal I,Kamruzzaman J.Share-frequent sensor patterns mining from wireless sensor network data[J].IEEE Transactions on Parallel&Distributed Systems,2015,26(12):3471-3484.
    [5] Tran M T,Le B,Vo B.Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently[J].Engineering Applications of Artificial Intelligence,2015,38:183-189.
    [6] Zhang Wen,Luo Ke.A parallel FP-Growth mining algorithm based on Spark framework[J].Computer Engineering&Sciences,2017,39(8):1403-1409.(in Chinese)
    [7] Xing Chang-zheng,An Wei-guo,Wang Xing.Improvement of algorithm for mining frequent itemsets in vertical data format[J].Computer Engineering&Sciences,2017,39(7):1365-1370.(in Chinese)
    [8] Beedkar K,Gemulla R.DESQ:Frequent sequence mining with subsequence constraints[C]∥Proc of IEEE ICDM’16,2016:1.
    [9] Zhu Yue-an,Zhou Xuan,Zhang Yan-song,et al.A review on performance optimization of transactional database under multi-core processor[J].Chinese Journal of Computers,2015,38(9):1865-1879.(in Chinese)
    [10] Colin A,Kandhalu A,Rajkumar R.Energy-efficient allocation of real-time applications onto single-isa heterogeneous multi-core processors[J].Journal of Signal Processing Systems,2016,84(1):91-110.
    [11] Luan Hua,Zhou Ming-quan,Fu Yan.Multi-core processor frequent graph mining method[J].Journal of Computer Research and Development,2015,52(12):2844-2856.(in Chinese)
    [12] Zhang Yong-xiong,Yu Bing-jun,Deng Zhi-hong.Application research of association rules based on bit vector in teaching evaluation[J].Journal of Langfang Teachers College(Natural Science Edition),2017,17(1):21-24.(in Chinese)
    [13] He Ya-wei,Hou Zheng-feng,Wu Liang-liang.An improved algorithm based on bit vector flow classification[J].Journal of Hefei University of Technology(Natural Science Edition),2015,38(3):331-335.(in Chinese)
    [14] Cordova J,Navarro G.Practical dynamic entropy-compressed bitvectors with applications[C]∥Proc of International Symposium on Experimental Algorithms,2016:105-117.
    [15] Vo B,Hong T-P,Le B.DBV-Miner:A dynamic bit-vector approach for fast mining frequent closed itemsets[J].Expert Systems with Applications,2012,39(8):7196-7206.
    [16] Kessl R.Probabilistic static load-balancing of parallel mining of frequent sequences[J].IEEE Transactions on Knowledge&Data Engineering,2016,28(5):1299-1311.
    [1]石杰.一种快速频繁模式挖掘算法[J].烟台大学学报(自然科学与工程版),2015,28(2):113-118.
    [2]刘卫明,蒯海龙,陈志刚,等.基于有序树的不确定数据最大频繁项挖掘算法[J].计算机工程与应用,2015,51(24):145-149.
    [6]张稳,罗可.一种基于Spark框架的并行FP-Growth挖掘算法[J].计算机工程与科学,2017,39(8):1403-1409.
    [7]邢长征,安维国,王星.垂直数据格式挖掘频繁项集算法的改进[J].计算机工程与科学,2017,39(7):1365-1370.
    [9]朱阅岸,周烜,张延松,等.多核处理器下事务型数据库性能优化技术综述[J].计算机学报,2015,38(9):1865-1879.
    [11]栾华,周明全,付艳.多核处理器上的频繁图挖掘方法[J].计算机研究与发展,2015,52(12):2844-2856.
    [12]张永雄,余丙军,邓志虹.基于位向量的关联规则算法在教学评价中的应用研究[J].廊坊师范学院学报(自然科学版),2017,17(1):21-24.
    [13]贺亚威,侯整风,吴亮亮.一种基于位向量流分类算法的改进[J].合肥工业大学学报(自然科学版),2015,38(3):331-335.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700