用户名: 密码: 验证码:
模糊、动态多维数据建模理论与方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
多维数据模型的研究为数据仓库技术与OLAP技术的广泛使用提供了基础支撑,具有重要的理论与实践价值。在多维数据模型中,维是一个非常重要的概念,由于其具有一定的层次结构,允许人们用不同的粒度对所关心的事实进行分析。现有的多维数据模型中,维的层次结构建立在完全划分的基础上,具有层次清晰、结构稳定的特性。但现实世界中,描述客观事物的信息往往是不确定、模糊的,而且客观事物本身又是动态演变的,从而难以基于静态的、界线分明的完全划分建立那种层次清晰、结构稳定的分析维模型。为此,本论文以模糊、动态条件下多维数据建模为研究目的,提出支持模糊维的多维数据模型和基于聚类的模糊维构建方法;提出连续数据流的多层次滑窗模型,设计连续数据流的在线聚集算法;提出数据流动态多维数据模型及其在线多维聚集方法。论文的主要工作和创新体现为以下四点:
     1.基于模糊商空间理论,提出了模糊多维数据模型
     通过引入模糊等价关系,提出了一种支持非完全划分的模糊维结构模型。与普通维相比,本文提出的模糊维主要在两个方面进行了扩展:一是扩展了两个维级别间的元素聚集关系?λ,支持依λ参数的元素聚集操作;二是在级别内部建立了元素聚集关系λ,支持级别内分层递阶结构上的元素聚集操作。而且这种扩展具有兼容性,即普通维可以作为模糊维的一个特例。
     在模糊维的基础上,论文给出了模糊多维数据模型、模糊数据立方体、聚集操作,以及上钻、下钻、选择、投影、切片和切块等基本OLAP分析操作的形式化描述。
     采用模糊粒度计算理论与方法对模糊聚集问题进行了深入的分析,提出了三种处理方法:保守法、乐观法和元素导出集法。与有关多维数据模型相比,本文提出的模糊多维数据模型突破了传统多维数据建模理论的局限,对非确定性、模糊多维数据分析问题,具有较强的描述与建模能力。
     2.提出了基于聚类的模糊维构建方法
     针对模糊等价关系难以确定的实际问题,论文根据对象集合的规模大小,分别提出了基于模糊聚类的模糊维构建方法和基于相对密度聚类的模糊维构建方法;同时,提出了基于相对密度的聚类算法,该算法能在不同参数下得到比较稳定的聚类结果,即聚类结果对参数设置不过于敏感,而且高密度的类簇能从相连的低密度的类簇中识别出来,从而可得到多密度分辨率的聚类结果。
     3.提出了数据流多层次窗口模型和在线聚集算法
     在数据流处理过程中,一般对最近时段的信息要求比较详细,而对较远时段的信息往往只需概貌。为此,论文提出了一种多层次时间窗口模型,能支持在不同时段对数据流进行不同时间粒度的建模;设计了多粒度聚集树结构和过期数据的金字塔快照存储结构;提出了数据流在线聚集与近似查询算法,通过性能分析可知,无论在存储空间还是处理时间上都能满足数据流在线聚集与查询分析的苛刻要求,从而有效地解决了有限时空条件下的数据流聚集与查询问题。
     4.提出了数据流动态多维数据模型及其在线多维聚集方法
     基于多层次时间窗口模型的时间维模式,提出了数据流动态多维数据模型。与一般数据仓库的多维数据模型相比,数据流动态多维数据模型的突出优点在于能支持时间维的跨度无限性和数据集的动态变化性。数据流时间维的跨度无限性决定了任何存储系统都难以保存整个时间域的所有数据粒子,因此,多层次时间窗口模型是数据流时间维建模的必然选择;而数据集变化的快速性和持续性决定了数据流多维数据模型应支持在线的多维聚集。
     由于数据流观测属性的表征性、细节性和技术性等特征,使得数据流多维联机分析处理中的维度选择与构建十分困难。论文提出了支持数据流维度动态建模的在线聚类算法;设计了支持数据流在线聚类与多维聚集的数据结构;提出了数据流基本单元的在线聚集物化方法。
     论文在模糊、动态多维数据建模理论和方法方面的研究,对于促进数据仓库技术、OLAP技术和数据挖掘技术的紧密集成和广泛应用具有一定的理论和实践意义。
As an underlying technical foundation which enriched the applications of data warehouse and OLAP techniques, the study of multi-dimensional data model has been acknowledged for its important theoretical and practical value. Dimension, as defined in the multi-dimensional data model, is a very important concept because of its hierarchical structure which allows people to analyze the facts concerned from different granularities. In the existing multi-dimensional data models, the hierarchical structure of dimension is often based on complete partition with clear hierarchy and stable structure. On the other hand, the information which describes a real world object is often incomplete and fuzzy, and the objects may possibly be dynamic and evolutional, thereby it is difficult to build the corresponding analytic dimensional model with clear hierarchy and stable structure. With multi-dimensional data modeling under fuzzy and dynamic conditions as the research goal, this thesis proposed a multi-dimensional data model which supports fuzzy dimension with the corresponding clustering-based dimension construction method, puts forward an online aggregation algorithm by studying the hierarchical sliding window model for continuous data stream, and presents an dynamic multi-dimensional data model for data stream with the relevant online multi-dimensional aggregation algorithm. The main contributions and innovations of this thesis are:
     1. Proposes the fuzzy multi-dimensional data model based on fuzzy quotient space theory.
     A fuzzy dimension structural model which supports incomplete partition is obtained by introducing in the fuzzy equivalence relation. The fuzzy dimension proposed here has extended the ordinary concept of dimension mainly in two aspects: firstly, it extends the aggregative relation ?λbetween two dimensional levels, and supports the parametric aggregation operation based onλ; secondly, it establishes the aggregative relationλwithin a level, and supports stepwise hierarchical aggregation. This kind of extension is also comprehensive, i.e., ordinary dimension can be taken as a special case of fuzzy dimension.
     Formal descriptions of the fuzzy multi-dimensional data model, fuzzy data cube, and some elemental OLAP operations such as drilling up, drilling down, selection, projection, slicing etc, is also presented in this thesis based on the concept of fuzzy dimension.
     Through an in-depth analysis of the imprecise aggregation problem using theories and methods in fuzzy granular computing, three processing methods which are conservative method, optimism method, and element-derived set method, are proposed. Compared with other related works, the presented fuzzy multi-dimensional data model, which is based on the solid ground of fuzzy quotient space theory, breaks the limitations of traditional multi-dimensional data modeling theory, strengthens the capabilities of description and modeling for uncertain and fuzzy multi-dimensional data analysis.
     2. Puts forward the clustering-based construction method of fuzzy dimension.
     To overcome the difficulties of determining the fuzzy equivalence relation, this thesis proposes two approaches for fuzzy dimension construction accord to different scales of the objects set: method based on fuzzy clustering and method on relative density clustering. Meanwhile, clustering algorithm based on relative density is also proposed, which can produce relatively stable clustering result under different parameters, or to say, the clustering results are not be too sensitive to the parameters. High-density clusters can also be identified from the connected low-density clusters, and thus the clustering results of multi-density can be gained.
     3. Proposes the multi-level sliding window model of data stream and the online aggregation algorithm.
     Generally in the processing of data stream, more detailed information is needed on the recent period of time than that from time interval far away. From this point of view, a multi-hierarchical time windows model is proposed to support the description of data stream at different time periods with multiple granularities. Multi-granularity aggregate tree data structure and pyramidal snapshots storage structure for expired data are also designed. Through performance analysis it can be seen that those designed structure suffices the rigorous requirements of the online aggregation and the query analysis of data stream whether considering the storage space or the processing time. In order to query the aggregations of data stream effectively at limited space-time expense, online aggregation methods and approximated query algorithms are also proposed.
     4. Dynamic multi-dimensional data model for data stream is proposed together with the correspondent online multi-dimensional aggregation methods.
     Multi-dimensional data model for the online analyzing and processing of data stream, is proposed based on time dimensional patterns of multi-hierarchical time windows model. Compared with ordinary multi-dimensional data model of data warehouse, the proposed one for data stream is advantaged in that it supports the infinite span of the time dimension and the continuous changes of datasets. The infinite span of time dimension makes it difficult for any storage system to preserve all the data in the whole time domain, thus it is an inevitable choice to model the time dimension of data stream with the multi-hierarchical time windows model. The rapid and continuous changes of data determine that a reasonable model should support the online multi-dimensional data aggregation.
     The observed properties of data stream have the features such as representative, technical, supporting details and so on, it is very difficult to construct and select the dimensions in the multi-dimensional online analysis processing of the data stream. This thesis presents the online clustering algorithm which supports the dynamic dimensional modeling of the data stream, and designs a data structure which supports the online clustering and multi-dimensional aggregation of the data stream, and proposes the online aggregation and materialized method of the basic units of the data stream.
     The research on the fuzzy and dynamic multi-dimensional data modeling of this thesis has the theoretical and practical significance for promoting the close integration and the wider use of data warehouse, OLAP, and data mining.
引文
[1] 邓苏,刘青宝,陈卫东,张维明. 数据仓库原理与应用. 北京:电子工业出版社,2002, 4.
    [2] G. Bell and J.N. Gray. The revolution yet to happen, Beyond Calculation . Springer Verlag, 1997.
    [3] 陈文伟. 智能决策支持技术. 北京:电子工业出版社,1998.
    [4] W.H. Inmon. Building the Data Warehouse. Prentice Hall, 1992.
    [5] E.F. Codd, S.B. Codd, C.T. Salley. Providing OLAP to User-Analysts: An IT Mandate. White Paper. Arbor Software Corporation, 1993.
    [6] J. Han, M. Kambr. Data Mining Concepts and Techniques. 北京:高等教育出版社,2001.
    [7] 王志海. 数据仓库 Building the Data Warehouse(第二版). 北京:机械工业出版社,2000.
    [8] S.R. Gardner. Building the data warehouse. Communications of the ACM. 1998, 41(9):52-60.
    [9] J. Widom. Research problems in data warehousing. Procedings of the 4th International Conference on Information and Knowledge Management (CIKM’95), Maryland, USA, 1995, 25-30.
    [10] A. Abelló, J. Samos, and F. Saltor. A framework for the classification and description of multidimensional data models. Procedings of the 12th International Conference on Database and Expert Systems Application (DEXA), volume 2113 of LNCS, Springer, 2001, 668-677.
    [11] J.M. Firestone. Dimensional Modeling and E-R Modeling in the Data Warehouse, DKMS-White Paper No.8, June 22, 1998.
    [12] 戴超凡,刘青宝等. 数据仓库中的元数据管理. 计算机工程与科学,2003, 25(4):54~58.
    [13] N. Pendse. The OLAP Report-What is OLAP? http://www.olapreport.com /fasmi.html, 2004.
    [14] OLAP Council. OLAP and OLAP Server Definitions. http://www. oulton. com/olap/olap.glossary.html, 2004.
    [15] www.microsoft.com/data/oledb/olap
    [16] H.J. Lenz and A. Shoshani. Summarizability in OLAP and Statisical Data Bases. In Proceedings of the 9th SSDBM Conference, Olympia, Washington, 1997, 132-143.
    [17] 陈文伟. 决策支持系统教材. 北京:清华大学出版社,2004, 11.
    [18] R. Kimball, M. Toss. The Data Warehouse Toolkit: the Complete Guide toDimensional Modeling. New York: Join Wiley &Sons, 2002.
    [19] 段云峰等. 数据仓库基础 Data Warehousing Fundamentals. 北京:电子工业出版社,2004.
    [20] E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems (2nd Edition). New York: John Wiley & Sons, 2002.
    [21] C. Li, X.S. Wang. A data model for supporting on-line analytical processing. Proceedings of the 5th International Conference on Information and Knowledge Management (CIKM), Rockville, Maryland, USA, November 1996.
    [22] M. Gyssens, L.V.S. Lakshmanan. A Foundation for Multi-Dimensional Databases. Proceedings of 23rd International Conference on Very Large DataBases, Athens, Greece, August 1997, 106-115.
    [23] L. Cabibbo, R. Torlone. Querying Multidimensional Databases. Proceedings of the 6th International Conference on Database Programming Languages (DBPL), Estes Park, Colorado, USA, August 1997, 319-335.
    [24] L. Cabibbo, R. Torlone. A Logical Approach to Multidimensional Databases. Proceedgings of the 6th International Conference on Extending Database Technology (EDBT), Valencia, Spain, March 1998, 183-197.
    [25] M. Golfarelli, D. Maio, S. Rizzi. The Dimensional Fact Model: a Conceptual Model for Data Warehouses. International Journal of Cooperative Information Systems, 1998, 7(2&3):215-248.
    [26] M. Golfarelli, S. Rizzi. A Methodological Framework for Data Warehouse Design. Proceedings of the 1st International Workshop on Data Warehousing and OLAP (DOLAP 98), ACM, 1998, 3-9.
    [27] M. Golfarelli, D. Maio, S. Rizzi. Concepual Design of Data Warehousing from E/R Schemes. Proceedings of the 31st Hawaii International Conference on System Sciences, IEEE Computer Society, 1998, 334-343.
    [28] M. Golfarelli, S. Rizzi. Designing the data warehouse: key steps and crucial issues. Journal of Computer Science and Information Management. 1999, 2(1):1-14.
    [29] 李建中, 高宏. 一种数据仓库的多维数据模型. 软件学报, 2000, 11(7):908- 917.
    [30] A. Sánchez, J.M. Cavero, Adoración de Miguel, Paloma Martínez. IDEA: A Conceptual Multidimensional Data Model and Some Methodological Implications. Proceedings of the VI Congreso Internacional de Investigación en Ciencias Computacionales (CIICC’99), Instituto Tecnológico de Cancún, 1999, 307-318.
    [31] M. Blaschka. FIESTA: A Framework for Schema Evolution in Multi-dimensional Databases. Ph.D. thesis, Technische Universitat Murchen, Germany, 2000.
    [32] Nectaria Tryfona, Frank Busborg, and Jens G. Borch Christiansen. StarER: A Conceptual Model for Data Warehouse Design. In: Proc. of the ACM 2nd Intl. Workshop on Data warehousing and OLAP (DOLAP’99), Kansas City, Missouri, USA, 1999, 3-8.
    [33] 陈明, 吴国文, 施伯乐. 数据仓库概念模型的设计. 小型微型计算机系统. 2002, 23(12):1453-1458.
    [34] 陈明. 数据仓库概念模型与物化视图的研究. 上海: 复旦大学博士学位论文, 2001.
    [35] A. Abelló. YAM2: A Multidimensional Conceptual Model. Ph.D. thesis, Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya, Spanish, 2002.
    [36] A. Tsois, N. Karayannidis, and T. Sellis. MAC: Conceptual Data Modeling for OLAP. Proceedings of the 3rd International Workshop on Design and Management of Data Warehouses (DMDW’2001), Interlaken, Switzerland, 2001(5):1-11.
    [37] J.C. Trujillo, M. Palomar. An Object-Oriented Approach to Multidimensional Database Conceptual Modeling. Proceedings of the 1st International Workshop on Data Warehousing and OLAP (DOLAP’98), Bethesda, Maryland, USA, 1998, 16-21.
    [38] J.C. Trujillo, M. Palomar, J. Gómez. Applying Object-Oriented Concepual Modeling Techniques to the Design of Multidimensional Databases and OLAP Applications. In Proceedings of the 1st International Conference on Web-Age Information (WAIM’2000), volume 1846 of LNCS, Springer, 2000, 83-94.
    [39] E. Franconi and A. Kamble. The GMD Data Model and Algebra for Multidimensional Information. Proceedings of the 16th International Conference on Advanced Information Systems Engineering (CAiSE'04), LNCS 3084, Springer, 2004, 446-462.
    [40] E. Franconi, A. Kamble. The GMD Data Model for Multidimensional Information: A Brief Introduction. In Proceedings of the 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK'03), Prague, Czech Republic, 2003.
    [41] N.T. Binh, A Mint Tjoa, Roland R. Wagner. An Object Oriented Multi- dimensional Data Model for OLAP. In Proceedings of 9th International Workshop on Database and Expert Systems Applications (DEXA’98), IEEE Computer Society, 1998, 198-203.
    [42] T.B. Pedersen, C.S. Jensen, Curtis E. Dyreson. Extending Practical Pre-Aggregation in On-Line Analytical Processing. In Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, 1999, 663-674.
    [43] A. Gupta, V. Harinarayan, D. Quass. Aggregate Query Processing in Data Warehousing Environments. In Proceedings of the Twenty-First International Conference on Very Large Data bases, Zurich, Switzerland 1995, 358-369.
    [44] S. Dar, H.V. Jagadish, A.Y. Levy, D. Srivastava. Answering SQL Queries Using Views. In Proceedings of the Twenty-Second International Conference on Very Large Data Bases, Bombay, India, 1996, 318-329.
    [45] F.N. Afrati, C. Li, J.D. Ullman. Generating Efficient Plans for Queries Using Views. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, May 2001, 319-330.
    [46] R. Pottinger, A. Levy. A Scalable Algorithm for Answering Queries Using Views, Proceedings of the 12th Australasian conference on Database technologies, Queensland, Australia, 2000, 484-495.
    [47] A. Tsois, T.K. Sellis. The Generalized Pre-Grouping Transformation: Aggregate-Query Optimization in the Presence of Dependencies. In Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, 644-655.
    [48] R. Pieringer, K. Elhardt, F. Ramsak, et al. Sellis: Combining Hierarchy Encoding and Pre-Grouping: Intelligent Grouping in Star Join Processing. Proceedings of the 19th International Conference on Data Engineering, Bangalore, India. 2003, 329-340.
    [49] I.S. Mumick, D. Quass, B.S. Mumick. Maintenance of data cubes and summary tables in a warehouse. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Arizona, USA, 1997, 100-111.
    [50] D. Quass, J. Widom. On-Line Warehouse View Maintenance for Batch Updates. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, Arizona, USA, 1997, 393-404.
    [51] R. Winter. Databases: Back in the OLAP game. Intelligent Enterprise Magazine. 1998, 1(4):60-64.
    [52] 王新军, 洪晓光. 数据仓库中多数据源物化视图的一种有效更新算法. 计算机研究与发展. 2004, 41(5):874-879.
    [53] 张岩, 杨冬青, 唐世渭. Web 仓储中的单视图一致性. 计算机研究与发展. 2004, 41(1):194-200.
    [54] P.M. Deshpande, J.F. Naughton, K. Ramasamy, et al. Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP. IEEEData Engineering Bulletin. 1997, 20(1):3-11.
    [55] Shukla, P.M. Deshpande, J.F. Naughton, K. Ramasamy. Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. In Proceedings of the 22nd International Conference on Very Large Data Bases, Bombay, India, 1996, 522-531.
    [56] The OLAP Report. Database Explsion. http://www.olapreport.com/Database Explosion.htm, 2004.
    [57] E. Baralis, S. Paraboschi, E. Teniente. Materialized View Selection in a Multidimensional Database. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, 1997, 156-165.
    [58] H. Gupta. Selection of Views to Materialize in a Data Warehouse. In Proceedings of the 6th International Conference on Database Theory, Delphi, Greece, 1997, 98-112.
    [59] V. Harinarayan, A. Rajaraman, J.D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, Montreal, Quebec, Canada, 1996, 205-216.
    [60] J. Yang, K. Karlapalem, Q. Li. Algorithms for materialized view design in a data warehousing environment. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, 1997, 136-145.
    [61] H. Gupta, I.S. Mumick. Selection of Views to Materialize Under a Maintenance-Time Constraint. In Proceedings of the 7th International Conference on Database Theory, Jerusalem, Israel, 1999, 453-470.
    [62] H. Gupta, V. Harinarayan, A. Rajaraman, J. Ullman. Index Selection for OLAP. In Proceedings of the 13th International Conference on Data Engineering, Birmingham U.K., 1997, 208-219.
    [63] T.B. Pedersen, C.S. Jensen. Multidimensional Data Modeling for Complex Data. In:Proc. of the 15th Int’l Conf. on Data Engineering, IEEE Computer Society, Washington DC USA, 1999, 336-345.
    [64] T.B. Pedersen, C.S. Jensen, C.E. Dyreson. A Foundation for Capturing and Querying Complex Multidimensional Data. Information Systems. 2001, 26(5):383-423.
    [65] C.S. Jensen, A. Kligys, T.B. Pedersen, I. Timko. Multidimensional data modeling for location-based services. The VLDB Journal, 2004(13):1–21.
    [66] 陆昌辉. 复杂多维数据模型的描述、构建与查询处理方法研究. 长沙:国防科技大学博士学位论文,2006, 6.
    [67] 陆昌辉,杨强,沙基昌,邓苏等. 一种改进的多维数据模型. 计算机研究与发展,2003(增刊):192-196.
    [68] 陆昌辉,邓苏,张维明. 多维数据概念模型的形式化描述. 计算机科学,2006, 8.
    [69] 林鹏. 战场信息 OLAP 支撑工具. 长沙:国防科技大学硕士学位论文, 2005, 11.
    [70] 陆昌辉,刘青宝,邓苏,张维明. 维度汇总性问题及其对策. 国防科大学报,2006, 8.
    [71] 张钹, 张铃. 问题求解理论与应用. 北京:清华大学出版社, 1990.
    [72] 李道国,苗夺谦,张红云. 粒度计算的理论、模型与方法. 复旦学报(自然科学版),2004, 10.
    [73] L.A. Zadeh. The key roles of information granulation and fuzzy logic in human reasoning. In: Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, FUZZ-IEEE’96, Germany, Physica-Verlag GmbH Heidelberg, 1996, 100-106.
    [74] L.A. Zadeh. Some reflections on information granulation and its centrality in granular computing, computing with words, the computational theory of perceptions and precisiated natural language. In: DataMining, Rough Sets and Granular Computing, Germany, Physica-Verlag GmbH Heidelberg, 2002, 110-153.
    [75] L.A. Zadeh. Fuzzy logic=computing with words. IEEE Transactionson Fuzzy Systems, 1996, 2:103-111.
    [76] Thiele Helmut. On semantic models for investigating computing with words. In:Proceedings of the Second International Conferenceon Knowledge Based Intelligent Electronic Systems(KES’98),USA, Institution of Electrical and Electronic Engineers Inc, 1998, 32-98.
    [77] Z. Pawlak. Roughsets. International Journal of Computer and Information Seience, 1982, 11:341-356.
    [78] 张铃,张钹. 模糊商空间理论(模糊粒度计算方法). 软件学报,2003, 14(4):770-776.
    [79] L. Golab, M.T. Ozsu. Issues in Data Stream Management. SIGMOD Record, 2003, 32(2):5-14.
    [80] Babcock B , Babu S, Datar M, et al. Models and Issues in Data Stream Systems. PODS, 2002.
    [81] 赵基. 基于数据挖掘的银行客户分析管理关键技术研究. 杭州:浙江大学博士学位论文,2005, 5.
    [82] 金澈,清卫宁,周傲英. 流数据分析与管理综述. 软件学报,2004, 15(8).
    [83] The STREAM Group. STREAM: The stanford stream data manager. IEEE Data Engineering Bulletin, 2003, 26(1):19?26.
    [84] Stanford Stream Data Management (STREAM) Project. http://www-db.stanford.edu/stream.
    [85] J.A. Daniel, C. Don, C. Ugur, et al. Aurora: A new model and architecture for data stream management. The Int’l Journal on Very Large Data Bases, 2003, 12(2):120?139.
    [86] J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagraCQ: A scalable continu- ous query system for internet databases. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, 2000, 5:379–390.
    [87] P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2000, 8:71–80.
    [88] C. Sirish, C. Owen, D. Amol, et al. TelegraphCQ: Continuous dataflow processing. In: Alon YH, ed. Proc. of the 2003 ACM SIGMOD Int’1 Conf. on Management of Data. New York: ACM Press, 2003, 668?668.
    [89] B. Shivanath, W. Jennifer. Continuous queries over data streams. SIGMOD Record, 2001, 30(3):109?120.
    [90] 李建中,郭龙江,张冬冬,王伟平. 数据流上的预测聚集查询处理算法. 软件学报,2005, 16(7).
    [91] 张冬冬,李建中,王伟平,郭龙江. 数据流历史数据的存储与聚集查询处理算法. 软件学报,2005, 16(12).
    [92] S. Guha, N. Mishra, R. Motwani and L. O'Callaghan. Clustering data streams. In:Proceeding of the 41st Annual Symposium on Foundations of Computer Science, FOCS 2000. California: IEEE Computer Society, 2000, 359-366
    [93] P. Rodrigues, J. Gama and J.P. Pedroso. Hierarchical time-series clustering for data streams. In: The 1st international workshop on knowledge discovery in data streams in conjunction with the 15th European Conference on Machine Learning (ECML’04). 2004, 22-31.
    [94] C. Aggarwal, J. Han, J. Wang, et al. A framework for clustering evolving data streams. In Proc. of VLDB, 2003.
    [95] C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A framework for projected clustering of high dimensional data streams,” in Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04). Toronto, Canada, 2004, 8: 852–863.
    [96] P. Domingos, G. Hulten. Mining high speed data streams. In: The 6th ACM international conference on knowledge discovery and data mining (SIGKDD). NY:ACM Press, 2000, 71-80
    [97] G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In: The 7th ACM international conference on Knowledge discovery and data mining (SIGKDD). NY: ACM Press, 2001, 97-106
    [98] P. Domingos, G. Hulten. Mining high-speed data streams. In: R. Ramakrish-nan, S. Stolfo, D. Pregibon, eds. Proc. of the 6th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000, 71-80.
    [99] P. Domingos, G. Hulten, L .Spencer. Mining time-changing data streams. In: F. Provost, R. Srikant, eds. Proc. of the 7th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001, 97-106.
    [100] A. Zhou, Z. Cai, L. Wei, W. Qian. M-Kernel merging: Towards density estimation over data streams. In: S.K. Cha, M. Yoshikawa, eds. The 8th Int’l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003, 285-292.
    [101] M. Garofalakis, J. Gehrke, R. Rastogi. Querying and mining data stream: you only get one look A tutorial. In: M.J. Franklin, B. Moon, A. Ailamaki, eds. Proc. of the 2002 ACM SIGMOD Int’l Conf. on Management of Data. Madison: ACM Press, 2002.
    [102] J. Han, Y. Chen, G. Dong, et al. Stream Cube :An Architecture for Multi- Dimensional Analysis of Data Streams. Distributed and Parallel Databases, Springer Science and Business Media, Inc. Manufactured in The Netherlands. 2005, 18:173-197.
    [103] 闫朝升. 数据流联机分析处理技术的研究. 黑龙江大学硕士学位论文,2004, 5.
    [104] M. Joseph. Firestone:Dimensional Modeling and ER Modeling in the Data Warehouse, DKMS-White Paper No.8, June 22, 1998.
    [105] 李泽海,孙吉贵,赵君,于海鸿. 联机分析处理中的非规则维建模. 计算机研究与发展,2006, 43(2):301-306.
    [106] 刘普寅,吴孟达. 模糊理论及其应用. 长沙:国防科技大学出版社, 1998, 11.
    [107] 李建中,王珊. 数据库系统原理(第二版). 北京:电子工业出版社,2004.
    [108] 李盛恩. 多维数据模型和数据立方体计算技术研究. 北京:中国科学院研究生院博士学位论文,2003.
    [109] H.J. Lenz and A. Shoshani. Summarizability in OLAP and Statisical Data Bases. In Proceedings of the 9th SSDBM Conference, Olympia, Washington, 1997, 132-143.
    [110] J. Horner, I.Y. Song, P.P. Chen. An Analysis of Additivity in OLAP Systems. In Proceedings of the 7th ACM international Workshop on Data Warehousing and OLAP, Washington, DC, USA, 2004, 83-91.
    [111] J. Han, M. Kambr 著,范明,孟小峰译. 数据挖掘概念与技术. 北京:机械工业出版社,2001.
    [112] 范金城,梅长林. 数据分析. 北京:科学出版社,2002, 7.
    [113] 孙即祥等. 现代模式识别. 长沙:国防科技大学出版社,2002, 1.
    [114] Q. Liu, Z. Zhou, Y. Liu, et al.Study on Feature Extraction and Pattern Classification Methods of EEG Data for Brain-computer Interface.the 2006 International Conference on Intelligent Computing (ICIC 2006),Springer Press,LNCIS 2006, 345:864-869.
    [115] 刘青宝,金燕,邓苏,张维明. 基于模糊聚类的属性匹配算法. 模糊系统与数学,2006, 12.
    [116] M. Ester, H.P. Kriegel, J. Sander, X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, 1996, 226-231.
    [117] M. Ankerst, M. Breunig, H.P. Kriegel, et al. OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. ACM SIGMOD ‘99, Int. Conf. on Management of Data. Philadelphia, PA, 1999.
    [118] 刘青宝,邓苏,张维明. 基于相对密度的聚类算法. 计算机科学,2007, 2.
    [119] 刘青宝,侯东风,邓苏,张维明. 基于相对密度的增量式聚类算法. 国防科大学报,2006, 10.
    [120] M. Breunig, H.P. Kriegel, R.T. Ng, et al. LOF: identifying density-based local outliers. In: Proc. ACM SIGMOD 2000 Int. Conf. On Management of Data. Dalles, TX, 2000.
    [121] Y. Zhou, Q. Liu, S. Deng, et al. An Incremental Outlier Factor Based Clustering Algorithm. Proceedings of 2002 International Conference on Machine Learning and Cybernetics,2002, 1358-1361.
    [122] Q. Liu, S. Deng, C. Lu, et al. Relative Density Based K-nearest Neighbors Clustering Algorithm. In: Proc. 2003 Int. Conf. on Machine Learning and Cybernetics. Xi'an, China, 2003, 133-137.
    [123] 刘青宝,金燕,张维明.基于相对密度的增量式多分辨聚类算法. 小型微型计算机系统,待复审.
    [124] http://www.dbs.informatik.uni-muenchen.de/index_e.html
    [125] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
    [126] 邵峰晶,于忠清. 数据挖掘:原理与算法. 北京:中国水利水电出版社, 2004, 10.
    [127] P.A. Tucker, D. Maier. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans. On Knowledge and Data Engineering, 2003, 15(3).
    [128] M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. of the 2002 Annual ACM-SIAM Symp. onDiscrete Algorithms, 2002, 635–644.
    [129] B. Babcock, M. Datar, R. Motwani. Sampling from a moving window over streaming data. In Proc. of the 2002 Annual ACM-SIAM Symp. on Discrete Algorithms, 2002, 633–63.,
    [130] B. Babcock, M. Datar, R. Motwani, and L. O’Callaghan. Maintaining variance and k-Medians over data stream windows. In: Neven F, ed. Proc. of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. San Diego: ACM Press, 2003, 234-243.
    [131] Y. Zhu, D. Shasha. StatStream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th International Conference on Very Large Data Bases, 2002, 8:358–369,
    [132] S. Guha, A. Meyerson, N. Mishra, et al. Clustering Data Streams: Theory and Practice. TKDE special issue on clustering, 2003, 15.
    [133] D. Zhang, D. Gunopulos, V.J. Tsotras, B. Seeger. Temporal aggregation over data streams using multiple granularities. In: C.S. Jensen, K.G. Jeffery, eds. Proc. of the 8th Int’l Conf. on Extending Database Technology. LNCS, 2002, 646?663.
    [134] J. Yang, J. Widom. Incremental Computation and Maintenance of Temporal Aggregates. Proc. of ICDE, 2001.
    [135] A. Bulut. SWAT: Hierarchical Stream Summarization in Large Networks. Proc. of ICDE, 2003.
    [136] 刘青宝,金燕,侯东风,张维明.数据流层次窗口模型及聚集查询算法.计算机科学,2007, 5.
    [137] M. Ester, H.P. Kriegel, J. Sander, et. al. Incremental clustering for mining in a data warehousing environment. In: A. Gupta, O.Shmueli, J.Widom, eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York: Morgan Kaufmann Publishers Inc., 1998, 323-333.
    [138] S. Guha, N. Mishra, R. Motwani, L. O’Callaghan. Clustering data streams. In: FOCS 2000. 2000, 359?366.
    [139] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha. Streaming-Data algorithms for high-quality clustering. In: ICDE Conf. 2002. 2002, 685?704.
    [140] F. Cao, E. Martin, W. Qian, et al. Density-based Clustering over an Evolving Data Stream with Noise, To appear in Proceedings of the 2006 SIAM Conference on Data Mining (SDM'2006). 2006.
    [141] 朱蔚恒,印鉴,谢益煌. 基于数据流的任意形状聚类算法. 软件学报,2006, 17(3).
    [142] 刘青宝,戴超凡,邓苏,张维明. 基于网格的数据流聚类算法. 计算机科学,2007, 3.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700