用户名: 密码: 验证码:
数据挖掘在范例推理和地理信息系统中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘是从大量的、不完全的、有噪声的、模糊的、随机的数据中,提取隐含在其中的、事先不为人知的、但又是潜在有用的信息和知识的过程。数据挖掘技术是面向应用的。它不仅是面向特定数据库的简单检索查询调用,而且要对这些数据进行微观、中观乃至宏观的统计、分析、综合和推理,以指导实际问题的求解,企图发现事件间的相互关联,甚至利用已有的数据对未来的活动进行预测。
     人工智能的研究历来是计算机理论及应用研究的前沿之一,但是知识获取的瓶颈,一直束缚着人工智能研究人员的研究进程,而基于范例推理系统(CBR)恰能较好地解决这个问题,目前已广泛应用于各种问题求解,有着十分良好的应用前景。然而,CBR系统自身的构建也需要大量的知识获取工作。能否通过数据挖掘技术获得隐含的知识,从而进一步降低CBR系统对领域专家的依赖性呢?本文将重点探讨和研究如何将数据挖掘技术应用于CBR中。
     地理信息系统(GIS)是60年代发展起来的一种集数据采集、存储、管理、分析于一体,并能够描述地球表面信息(包括大气层在内)以及空间、地理分布相关数据的空间信息系统。随着计算机技术的迅速发展和社会需求的不断增加,GIS技术逐渐走向成熟,应用领域不断扩大。人们对GIS的要求已经不再局限于简单的图层显示和自动制图,人们期望从中获得更多的知识,因而提出了将数据挖掘、智能决策引入GIS中。本文将重点探讨和研究如何将数据挖掘技术应用于GIS中。
     本文共分六章。第一章对数据挖掘技术、范例推理和地理信息系统进行了综述,说明了本文的立题依据和意义,提出了本文的研究方向和重点。
     第二章介绍了数据挖掘的一些基本概念,对数据挖掘的主要技术----离群分析、聚类以及分类技术作了深入的研究和探讨,并在此基础上详细地给出了本文后继章节中所涉及到的主要数据挖掘算法。
     第三章介绍了CBR的一些基本概念、基本原理以及CBR系统的特点,并在此基础上对CBR中的关键技术作了深入的研究和探讨。
     第四章研究了数据挖掘技术在CBR中的应用。首先研究和探讨了CBR中数据挖掘的主要技术和方法,之后针对不同的应用需求,提出了两个范例库构造算法和一个范例库维护算法,分别将数据挖掘中的主要技术----关联分析、离群分析、聚类和分类技术应用于其中,并给出了实验实例,实验结果证明了算法能够有效地提高CBR系统中知识获取的自动化程度和系统的性能。
     第五章研究了数据挖掘技术在GIS中的应用。首先介绍了地理信息系统的一些基本概念,之后研究了空间数据分析的基本技术,提出了GIS中基于专家系统
    
     安徽大学硕上学位论文 摘 要
     与范例推理技术的空间数据挖掘体系结构,最后开发了一个实际应用的地理信息
     系统,并将数据挖掘技术应用于该地理信息系统之中。
     第六章是全文的总结和研究工作的展望。
Data Mining is the process of extracting hidden , unknown but potential useful information and knowledge from vast, incomplete, noisy, fuzzy and random datum. Data Mining technology is oriented to application. It not only aims at simple search and query, but also makes a microcosmic and macroscopical statistic , analysis , synthesis and reasoning of datum to tutor the solving of the practical problem, to manage to find out the interrelation of the events, and even to make forecast by using the known datum.
    The research of AI( Artificial Intelligence) has always been one of the front of
    
    the computer theory and application research. But the bottleneck of knowledge acquisition cumbers the research progress of AI researcher. CBR( Case Based Reasoning) can solve this problem with better results and is widely applied to various fields of problem solving. The prospect of the application is very well. But it also needs vast work of knowledge acquisition in its own construction. Can we find potential knowledge by using Data Mining technology and reduce the reliance on the field expert? In this thesis, the application of Data Mining in CBR is investigated.
    GIS( Geographic Information System) was developed from 1960s'. It integrates the data collection , storage , management and analysis. It can describe the information of earth surface (including aerosphere ) and the spatial information of space and geography distribution. With the development of computer technology and social demand, GIS technology is going up and the range of its application are widening continuously. People don't want GIS only to show map simply and to make map automatically. People expect to get more knowledge from it. So DM (Data Mining) and IDSS (Intelligent Decision Support System) are imported into GIS. In this thesis, the application of Data Mining in GIS is investigated.
    The thesis consists of six chapters. In the first chapter, the thesis makes a summarize of the technology of Data Mining , CBR and GIS, illustrates the foundation and significance of this thesis and puts forward the research direction and emphases.
    In the second chapter, the thesis introduces some primary concepts of Data Mining, investigates and discusses the key technology of Data Mining, including outlier analysis , clustering and classification. And the main algorithms of Data Mining, which the later chapters of this thesis refer to, are also given in detail.
    In the third chapter, the thesis introduces some primary concepts and principles
    
    
    
    
    of CBR, investigates and discusses the key technology of CBR.
    In the fourth chapter, the thesis investigates the application of Data Mining in CBR. Firstly, the main Data Mining technology and methods in CBR are investigated and discussed. Secondly, to meet different requirements of applications, two algorithms of case base construction and one algorithm of case base maintenance are put forward, to which association analysis , outlier analysis . clustering and classification are applied. We implement them and analyze the results. Our experiments show that our algorithms can effectively improve the automation degree of knowledge acquisition and performance of CBR system.
    In the fifth chapter, the thesis investigates the application of Data Mining in GIS. Firstly, some primary concepts of GIS are introduced. Secondly, the main technology of spatial data analysis are investigated and discussed. And then a system frame of Spatial Data Mining based on Expert System and CBR is put forword. At last, we develop a practical application system of GIS to which Data Mining technology is applied.
    The last chapter is the summarize of the whole thesis and also makes a prospect of our research.
引文
[1] Jiawei Han,Micheline Kamber,数据挖掘概念与技术,北京:机械工业出版社,2001
    [2] 史忠植,高级人工智能,北京:科学出版社,1998
    [3] 陆汝钤,人工智能,北京:科学出版社,1996
    [4] 史忠植,知识发现,北京:清华大学出版社,2002
    [5] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, The KDD Process for Extracting Useful Knowledge from Volumes of Data, Communication of the ACM, Vol.39, No.ll, 1996, pp. 27-34
    [6] 汤国安,地理信息系统,北京:科学出版社,2000
    [7] 吴信才,地理信息系统原理与方法,北京:电子工业出版社,2002
    [8] 吴信才,地理信息系统设计与实现,北京:电子工业出版社,2002
    [9] 姜斌祥,基于事例推理专家系统的构造及事例修正的进化算法的研究,北京:北方交通大学硕士研究生学位论文,1997.1
    [10] 史东辉,离群数据知识发现的研究,合肥:中国科学技术大学博士学位论文,2001.6
    [11] Barnett V, Lewis T. Outliers in Statistical Data. New York: John Wiley & Sons,1994
    [12] Knorr E,Ng R.A unified approach for mining outliers: Properties and computation. In: Proc of 1997 Int'l Conf on Knowledge Discovery and Data Ming(KDD'99).Newport Beach,California,1997,219~222
    [13] 于秀林、任雪松,多元统计学,中国统计出版社,1999
    [14] 陈德钊,多元数据处理,化学工业出版社,1998
    [15] 杨贻玲,基于 WEB 日志挖掘技术的智能 WEB 站点研究,上海:上海交通大学博士学位论文,2002.10
    [16] 卜东波,聚类/分类理论研究及其在文本挖掘中的应用,北京:中国科学院计算技术研究所博士论文,2000.10
    [17] 陆汝钤,世纪之交的知识工程与知识科学,北京:清华大学出版社,2001.9
    
    Springer Verlag. 1997
    [20] E.Simoudis, K.Ford, Knowledge acquisition in Case-Based Reasoning. In Dankel , D., ed., Proceedings of the 1992 Floridas AI Research Symposium, FLAIRS,1992
    [21] Schank R. Dynamic memory: A theory of reminding and learning in computers and people. Cambridge: Cambridge University Press, 1982
    [22] Watson I, Marir F. Case_based reasoning: A review. The Knowledge Engineering Review, 1994,9(4):327~354
    [23] A.Aamodt, E.Plaza, Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches, AICom-Artifical Intelligence Communications, IOSPress, Vol.7: 1, pp.39-59
    [24] I.Gilboa, D.Schmeidler, Case-Based Decision Theory. Quarterly Journal of Economics,110,605-639
    [25] B.Porter, R.Bareiss, PROTOS: An Experiment in Knowledge Acquisition for Heuristic Classification Tasks, In: Proc. Of First International Meeting on Advances in Learning (IMAL), France, 159-174
    [26] S.S.Anand,D.Patterson,J.G. Hughes,D.A.Bell. Discovering Case Knowledge Using Data Mining, The Second Pacific_Asia Conference on KDD, Springer, Berlin, 25-35
    [27] J.R.Quinlan, C4.5:Programs for Machine Learning, Morgan Kaufmann, 1992
    [28] 于庚康,于达人,朱建国.气象服务决策系统[C].全国减轻自然灾害研讨会论文集北京:气象出版社,1992
    [29] Haihong Dai. Discovery of Case for Case-Based Reasoning in Engineering [C]. proceedings of the Asia-Pacific Software Engineering Conference and International Computer Science Conference,1997,pp.89-96
    [30] Agrawal,R. and Srikant,R. Fast Algorithms for Mining Association Rules[C]. Proc. Of Int's Conf. On Very Large Databases, Santiage,Chile,1994,pp.487-499
    [31] Cheung,D.W.,Ng,V.T.,Fu,A.W. and Fu,Y. Efficient Mining of Association Rules in Distributed Databases[J]. IEEE Transations on Knowledge and Data Engineering,Vol.8,No.6,1996,pp.911-922
    [32] Agrawal,R.,K.I.Lin, and Sawhney, H.S.et al. Fast Similarity Search in the Presence of Noise,Scaling, and Translation in Time-Series Databases[C]. Proc. of Int'1 Conf. on Very Large Databases, Switzerland,1995,pp.490-501
    [33] Agrawal,R.and Srikant,R. Mining Sequential Patterns[C]. Proc. of 11th Int'l Conf. on Data Engineering, March 1995, pp.3-14
    [34] 李德仁,王树良,论空间数据挖掘和知识发现,武汉大学学报,2001.12,
    
    Vol.26, No.6
    [35] 李德仁,王树良,论空间数据挖掘和知识发现的理论与方法,武汉大学学报,2002.6,Vol.27,No.3
    [36] M. Ester, H. Kriegel, And X. Xu. Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification. In Proc. Of 4th Int'L Symp. On Large Spatial Databases (Ssd' 95 ), Portland, Maine, Aug. 1995,pp.67-82
    [37] Dewitt D.J., Kabra N., Lou J. Client-Server Paradise. Proc. 20th Int' Conf. On Very Large Databases, Santiago, Chile, 1994, pp.558-569
    [38] 刘宇,空间数据库存取和查询的理论与实践,上海:上海交通大学博士学位论文,2001.4
    [39] 徐铭杰,李俊民,赵永江,基于空间数据仓库的空间数据挖掘研究,中原工学院学报,2002,Vol.13,No.1,3
    [40] 周海燕,王家耀,吴升,空间数据挖掘技术及其应用,测绘通报,2002,Vol.2,No.21
    [41] 田金兰、李奔,用决策树方法挖掘保险业务数据中的投资风险规则,小型微型计算机系统,2000,Vol.21,No.10,10
    [42] 徐祖舰.GIS 入门与提高[M].重庆.重庆大学出版社,2001.
    [43] Mapinfo.MapBasicUser'sGuide [M]. MapinfoCorporation. 2000
    [44] 于之虹,郭志忠数据挖掘与电力系统[J].电网技术,Vol.25,No.8,2001,pp.58-62
    [45] 曾亮,中长期电力负荷预测中的新思路,湖北电力,1999,6,Vol.23,No.2
    [46] 王建新,李彦民,数据挖掘技术在一类电力负荷预测中的应用,计算机工程与应用,2002.5
    [47] 陈文伟,智能决策技术[M], 电于工业出版社,1998年
    [48] 高洪深,决策支持系统[M],清华大学出版社,1996年
    [49] 倪志伟,等,范例库上的知识发现,南开大学学报,2002,Vol.35,No.4
    [50] Jagielska, I.,Matthews,C. and Whitfort,T. An investigation into the application of neural networks,fuzzy logic,genetic algorithms ,and rough sets to automated knowledge acquisition for classification problems. Neurocomputing,24,1999, 37-54
    [51] Hiroshi Tsukimoto. Extracting rules from neural networks, IEEE Transactions on Neural Networks, Vol. 11 ,No.2, 2000,377-389
    [52] A.Kraslawski,W. Pedrycz and L.Nystrom, Fuzzy neural network as instance generator for case-based reasoning system, Neural Computing & Applications, vol.& 1999,106-113
    [53] F. Azuaje,et al. Discovering relevance knowledge in data:a growing cell
    
    structure approach,IEEE Transactions on systems,man, and cybernetics,Vol. 30,No. 3, june, 2000
    [54] D.Skalak, Prototype and features selection by sampling and random mutation hill-climbing algorithm, in Proc. 11th Int. Machine Learning Conf.,Morgan Kaufmann, 1994,293-301
    [55] O.Babaka and S.Y. Whar. Case-based reasoning and decision support systems. IEEE Internal Conference on Intelligent Processing Systems. Oct.,1997, Beijing, 1532-1536
    [56] David McSherry. Demand-driven discovery of adaptation knowledge. In: International Joint Conference on Artificial Intelligence, Volume 1,1999,222-227
    [57] Stephen Flinter and Mark T. Keane. On the automatic generation of case libraries by chunking chess games. In Proceedings of the First International Conference on Case-Based Reasoning, 1995,421-430
    [58] 赵鹏,倪志伟,贾兆红,利用数据挖掘技术从气象数据库中建立范例库,微机发展,2002,No.3
    [59] 赵鹏,倪志伟,贾兆红,数据挖掘技术在构造范例库中的应用,计算机应用与研究,2002,No.7
    [60] 赵鹏,贾瑞玉,智能地理信息系统设计与开发,微机发展,2003,No.3
    [61] 赵鹏,倪志伟,贾瑞玉,基于数据挖掘技术的范例库维护,安徽大学学报(自然科学版),2003.No.2

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700