数据挖掘中关联规则算法的研究及应用

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

数据挖掘中关联规则算法的研究及应用

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research and Application of Association Rull Mining Algorithm in the Data Mining
作者：梅俊
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：数据挖掘 ; 关联规则 ; 最大频繁项目 ; 增量更新 ; 频繁模式
英文关键词：data mining ; association rule ; maximum frequent item sets ; incremental updating ; frequent patter
学位年度：2010
导师：郑刚
学科代码：081203
学位授予单位：安徽工程大学
论文提交日期：2010-06-08

摘要

数据挖掘是从大量的、不完全的、有噪声的、模糊的、随机的数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程,其主要目标是从大型的数据库中挖掘出对用户有价值的信息。其中关联规则挖掘是数据挖掘的一个重要研究分支,主要用于发现数据集中项之间的相关联系。由于关联规则形式简洁、易于解释和理解并可以有效地捕捉数据间的重要关系,因此从大型数据库中挖掘关联规则问题已成为数据挖掘中最成熟、最重要、最活跃的研究内容。
     本文对数据挖掘技术,尤其是关联规则数据挖掘技术进行了全面地分析和研究,在先前研究的基础上,提出解决相应问题的关联规则挖掘算法。论文的主要内容包括以下四个方面：
     第一、数据挖掘技术、关联规则挖掘技术的分析与研究。文中详细地介绍了数据挖掘基本概念,并对数据挖掘的过程、数据挖掘的应用领域以及数据挖掘的常用技术进行分类、归纳和总结,并且对数据挖掘技术的国内外研究现状进行分析；文中还对关联规则的定义、性质、基本步骤做了系统地阐述,分析研究关联规则挖掘的经典挖掘算法Apriori以及基于Apriori算法的的改进方法,另外,对不产生候选挖掘频繁项集的FP-growth算法的过程、思想进行了详细地描述。
     第二、深入研究了关联规则中最大频繁项目集,提出一种基于FP-tree结构的最大频繁模式挖掘算法DMFIA-D。通过实例说明该DMFIA-D算法执行过程,并通过试验证明该算法与DMFIA算法相比更具有优越性,试验还验证了算法的可扩展性。DMFIA-D算法对FP-tree结构进行了改进,充分利用FP-tree结构特征,并运用双向搜索策略,自顶向下选取最大频繁候选项集,自底向上对候选项集进行计数、剪枝最终确定最大频繁项目集。由于减少了最大频繁候选集,并对候选集进行有效剪枝,从而缩短了算法的挖掘时间,提高效率。
     第三、文中研究了增量更新算法FUP,提出一种基于临时表的改进算法MFUP。实例说明了MFUP算法的执行过程,实验验证了MFUP算法的优越性。通过对FUP算法进行分析,指出它的优缺点,针对FUP算法的不足,提出改进算法MFUP。该算法通过建立临时表,来存放增量数据库的频繁项集,充分利用原数据库挖掘的结果,尽早的删除了更新数据库的非频繁项目集,从而大大减少了对数据的重复扫描,提高了数据挖掘算法的效率。
     第四、研究探讨了算法DMFIA-D在超市系统分析中会员消费情况的应用尝试。为超市系统针对会员消费情况制定销售策略、促销活动等提供辅助决策信息。
Data mining is to reveal the implicated but useful information from massive, noisy, fuzzy, incomplete dataset. Its essential target is to extract valuable information from the large-scale database. Association rule mining is an important branch of data mining, which mainly use to find relevant contact. Because the form of association rule mining is succinct, easy to explain and understanding, and may catch the data effectively the important relation. Now the question of association rule mining from the large-scale database has become the most active, matures, importantly research content in the data mining.
     In this thesis, we research and analyse the data mining technology, especially the association rule mining technology. Based on the previous research, we put forward corresponding algorithm of mining association rules for the problems which has found in the research process. The thesis mainly includes the following four aspects:
     First, the data mining technology, the association rules mining technical are analysed and researched. We introduce the basic concept of data mining, and classifies and deduces and summarizes the process of data mining, the application field of data mining and the common technology of the data mining, the domestic and overseas research situation of the data mining in the paper. Meanwhile, we expatiate the basis concept of the association rules by the numbers, and deduce the classification of the association rules and the basis steps of the association rules. We also research the classic algorithm Apriori of the association rule, the improving method based on of the Apriori algorithm and the FP-growth algorithm of no candidate mining of frequent item sets.
     Second, we have researched the maximum frequent item sets,and proposed an DMFIA-D algorithm for mining maximum frequent item sets based on FP-tree.We explained the algorithm through the example,and validated the superiority and expansibility of the algorithm.Based on the research of the concept of maximum frequent item sets and the existing maximum frequent item sets algorithms, We propose an algorithm for mining maximum frequent item sets based on FP-tree, which algorithm improves FP-tree structure, and makes full use of FP-tree structural features, and uses bi-directional search strategy. The bi-directional search strategy means that the top-down search the candidates item sets of the maximum frequent and the bottom-up count or cut the candidates first, and finally make sure the maximum frequent item sets. Because of cutting down the candidates, so it reduces the time of the algorithm mining,.and inceases the efficiency.
     Thirdly, in the paper we also research the incremental updating algorithm FUP, and analyse the FUP algorithm, and propose the advantages and the disadvantages of the FUP algorithm. We provide a new algorithm MFUP which algorithm based on temporary table for mining association rules. The MFUP algorithm made full use of the old data mining rules and reduced the times of scaning the database greatly, thus the data mining efficiency increased. The example and the experiment in the paper shows that MFUP is better than FUP.
     At last, we study the problem of the application of mining maximal frequent patterns algorithm DMFIA-D in the analysis of the supermarket system. Mei Jun(Computer Application Technology) Supervised by Zheng Gang

引文

[1]J.Han and M. Kamber著,范明、孟小峰等译.数据挖掘概念与技术[M].北京：北京机械工业出版社,2001：149-179.
    [2]Agrawal R,Imielinski T, Swami A. Mining Association Rules between Sets of Items in Large Databases[A]. In:Proc 1993 ACM_SIGMOD Conf Management of Data. Washington, DC:ACM Press,1993,207-216.
    [3]Quinlan J.R. C4.5:Programs for Machine Learning[C]. San Mateo, California, Morgan Kauf-mann,1993.
    [4]Chen M S, Han J W, Yu P S. Data mining:an overview from a database perspective[J]. IEEE Transaction on Knowledge and Data Engineering,1996, 8(6),866-883.
    [5]J.S. Park, M. S. Chen,P.S.Yu. An Effective Hash Based Algorithm for Mining Association Rules. Michael J.Carey and Donovan A.Schneider,Proceedings of the ACM-SIGMOD International Conference On Management of Data(SIGMOD'95), San Jose,California,1995,ACM Press Publisher,1995:175-186.
    [6]A.Savasere, E.Omiecinski,S.Navathe.An efficient algorithm for mining association rules in large databases. Umeshwar Dayal, PeterM.D.Gray, and Shojiro Nishio. Proceedings of the 21st International Conference on Very Large Databases (VLDB'95), Zurich,Switzerland,1995,MorganKaufmannPublisher,1995:432-443.
    [7]Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In:Chen WD,Naughton J, Bernstein PA,eds, Proc.Of the 2000 ACM SIGMOD Int'l Conf. on Management of Data(SIGMOD 2000).New York:ACM Press,2000:1-12.
    [8]R.J.Bayardo. Efficiently Mining Long Patterns from Databases[C]. ACM SIGMOD Conference Proceedings,1999:85-93.
    [9]D.Lin,Z.M.Kedem.Pincer-Search:A New Algorithm for Discovering the Maximum Frequent Itemset [C]. EDBT Conference Proceedings.1998:105-110.
    [10]D.Burdick,M.Calimlim,J.Gehrke. MAFIA:A Maximal Frequent Itemset Algorithm for Transactional Databases[C]. Proceedings of the ICDE Conference,2001.
    [11]路松峰,卢正鼎.快速开采最大频繁项目集[J].软件学报,2001(2)：293-297.
    [12]宋余庆,朱玉全,孙志挥,等.基于FP-Tree的最大频繁项目集挖掘及更新算法[J]. 软件学报,2003,14(9)：1586-1592.
    [13]Cheung D W. Maintenance of Discovered Association Rules in Large Databases: An Incremental Update Technique[C]. In:Proceedings of the 12th International conference on Data Engineering, New Orleans, Louisana,1996,106-114.
    [14]D Dunopulos,H.Mannila,and S.Saluja. Discovering all the most specific sentences by randomized algorithms. In intl.Conf. on Database Theory,Jan 1997.
    [15]Lu flong jun, Rudy Setiono, Liu Huan, Effective data mining using neural networks[C]. IEEE Transactions on Knowledge and Date Engineering,1996,8(6): 957-961.
    [16]刘明吉等.一种基于遗传算法的知识挖掘算法[J].计算机工程,2000,26(S)：13-14
    [17]陆汝铃.人工智能[M].科学出版社,1996：823-844.
    [18]冯玉才,冯剑琳.关联规则的增量式更新算法[J].软件学报,1998,9(1)：301-306.
    [19]朱玉全等.约束关联规则的有效挖掘算法[J].计算机工程,2002,28(2)：29-31.
    [20]周皓峰等.一个基于兴趣度的关联规则采掘算法[J].计算机研究与发展,2000,5：627-633.
    [21]R.Agrawal and R.Srikant. Fast Algorithms for Mining Association Rules. Jorge B.Bocca, Matthias Jarke, and Carlo Zaniolo, Proceedings of the 20th International Conference on Very Large Databases(VLDB'94),Santiago,Chile,1994,Morgan Kaufmann Publisher,1994:487-499.
    [22]何小东,刘卫国.数据挖掘中关联规则挖掘算法比较研究[J].计算机工程与设计,2005,26(5)：1265-1268.
    [23]王小虎.关联规则挖掘综述[J].计算机工程与应用,2003,33：190-193.
    [24]J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases. UmeshwarDayal, Peter M.D. Gray, and Shojiro Nishio. Proceedings of the 21st International Conference on Very Large Databased (VLDB'95), Zurich, Switzerland,1995.Morgan Kaufmann Publisher,1995:420-431
    [25]H. Toivonen. Sampling large databases for association rules. T.M.Vijayaraman, Alejandro P.Buchmann,C.Mohan,and NandlalL,Sarda,Proceedings of the 22nd International Conference on Very Large Databases(VLDB'96),Bombay,India, 1996,Morgan Kaufmann Publisher,1996:134-145.
    [26]GrahneG,ZhuJF. High Performance mining of maximal frequent itenlsets[C]. In: Proe.of the 6th SIAMInt'l Work shoPon High Performance Data Mining (HPDM2003).SanFraneiseo,CA,2003:135-143.
    [27]Lu SF, Lu ZD. Fast mining maximum frequent itemsets. Journal of Software[J]. 2001,12(2):293-297.(in Chinese with English abstract).
    [28]R. C. Agrwal,C. C. Aggarwal,V. V. V. Prasad. Depth First Generation of long Patterns. Proceedings of the ACMSIGKDD Conference,2000.
    [29]G. Karam, J. Z. Mohammed. Efficiently Mining Maximal Frequent Itemsets. Proceedings of the ICDM Conference,2001.
    [30]陈旭辉,蒋红.基于双向搜索的最大频繁项目集挖掘算法[J].计算机工程与设计,2007,28(14)：3288-3290.
    [31]杨君锐.逆向启发式开采最大频繁项目集[J].计算机工程,2004,30(14)：116-118.
    [32]N F Ayan,etal. An Efficient Algorithm To Update Large Itemsets with Early Pruning[C]. Proc. of the 5th Int. Conf.on Knowledge Discovery and Data Mining (KDD'99),San Diego,California,USA,1999:287-291.
    [33]D.W. Cheung, S.D. Lee, B. Kao. A General Incremental Technique for Maintaining Discovered Association Rules[C]. In:Proc of the Fifth Int Conf. on Database Systems for Advanced Applications,Melbourne, Australia,April 1997,185-194.
    [34]S.D.Lee,D.W.Cheung. Maintenance of Discovered Association Rules:When to Update?[C]. In:Proc.of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery,Tucson, Arizona,May 1997.
    [35]陈劲松等.一种增量更新算法[J].计算机工程,2002,28(7)：106-107.
    [36]朱玉全,汪晓刚.一种新的关联规则增量式更新算法[J].计算机工程,2002,28(4),25-27.
    [37]朱红蕾,李明.维护关联规则的算法研究[J].兰州理工大学学报,2004,30(5)：104-107.
    [38]商志会,陶树平.一种高效的关联规则增量更新算法[J].计算机应用,2005,25(4)：830-832.
    [39]徐文拴,辛运帏.一种改进的关联规则维护算法[J].计算机工程与应用,2006,18：178-180.
    [40]厉浩,李珊.一个新的FUP-Based关联规则增量式更新算法[J].计算机工程与科学.2005,27(7)：74-76.
    [41]孙浩,赵霁.一种关联规则增量更新算法[J].系统工程与电子技术.2004,26(5)：676-707.
    [42]张师超,张继连等.负增量式关联规则更新算法[J].计算机科学2005,32(9)：153-175.
    [43]程雁,闪四清.面向数据删除的关联规则更新算法[J].计算机工程,2005,31(17)：98-99.
    [44]石冰,郑燕峰.改进型关联规则增量式更新算法与实现[J].小型微型计算机系统,2000,21(12)：1327-1329.
    [45]宋海生.关联规则的增量式更新算法[J].兰州大学学报,2004,40(2)24-26.
    [46]高峰,谢剑英.发现关联规则的增量式更新算法[J].计算机工程,2000,26(12)：49-50.
    [47]皋军,王建东.关联规则挖掘算法更新与拓展[J].计算机工程与应用,2003,39(35)：178-179
    [48]Charu C. Aggarwal, PhiliP5.Yu. On line generation of association rules. Technical Report RC 20899(92609),IBM Research Division,T.J. Watson Research Center, York town Heights, NY,June 1997:3-SP.
    [49]周海岩.关联规则的开采与更新[J].软件学报,1999,10(10)：1078-1084.
    [50]李铭,蔡庆生.一个高效的关联规则增量式更新算法[J].计算机工程与应用,2000,5：47-49.
    [51]孙颖生.数据挖掘技术在零售业中的应用[D].长春理工大学,2007.
    [52]徐金宝.数据挖掘技术在超市客户关系管理系统中的应用[D].南京理工大学,2007.
    [53]吴洪波,赵谦.关联规则在大型超市中的应用[J].哈尔滨理工大学学报,2008(4)：128-134
    [54]张峰.商业供应链管理系统中数据挖掘的应用[D].中国海洋大学,2006.
    [55]袁柳.CRM中数据挖掘应用策略及其关键技术的研究[D].西安电子科技大学,2004.
    [56]田玲.银行客户关系管理的数据挖掘应用研究[D].四川大学,2003.
    [57]耿晓中.超市管理系统及数据挖掘技术在其上的应用[D].吉林大学,2004
    1.梅俊,郑刚.一种基于临时表的关联规则增量更新算法[J].安徽工程科技学院学报,2010.3：44-47
    2.梅俊,郑刚.一种基于FP-tree的最大频繁项目集挖掘算法[J].现代计算机,2009.9：33-36

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700