面向大规模网络安全态势分析的时序数据挖掘关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

面向大规模网络安全态势分析的时序数据挖掘关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Key Technologies Research on Time Series Data Mining for Large Scale Network Security Situation Analysis
作者：程文聪
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：时序数据 ; 时序数据挖掘 ; 网络安全态势分析 ; 异常检测 ; 区间skyline ; 相似子序列搜索 ; 预测
英文关键词：Time series ; Time series data mining ; Network security situation analysis ; Anomaly detection ; Interval skyline ; Similar sub-sequence search ; Prediction
学位年度：2010
导师：邹鹏
学科代码：081202
学位授予单位：国防科学技术大学
论文提交日期：2010-03-01

摘要

网络安全态势分析能够帮助网络管理者了解大规模网络的安全状态,并能为管理决策提供依据,因此近年来日益受到了政府和研究机构的关注和重视。为了获取大规模网络安全态势分析所需的基础数据,一些威胁检测工具被部署在了骨干网络上。由于对性能有较高要求,这些工具往往会采用专用化方式部署,因此产生出的数据的可关联性较差,难以通过小规模网络安全分析中常用的关联分析法进行处理,而一般只能通过统计分析来提取其中的信息。这些统计分析信息随时间变化所形成的网络安全时序数据可以反映出网络风险的变化,因而大规模网络的安全态势分析很大程度上依赖于对这些网络安全时序数据的有效挖掘。
     本文面向大规模网络安全态势分析的需求,以“863-917”网络安全监测平台所采集到的木马数据以及蜜网系统所获得的僵尸网络数据为实例研究对象,针对网络安全时序数据进行挖掘研究,从发现这些数据的特殊变化和提供决策参考的角度提取出几个关键性问题进行了研究。主要工作包括:
     1、伪周期网络安全时序数据异常波段检测。许多网络安全时序数据具有典型的伪周期特征,其中异常波段的出现往往表明网络安全风险发生了一定变化,具有进一步深入分析的必要。由于网络环境具有不稳定性,因而通过引入具有较好偏移适应性的动态时间弯曲距离作为不同波段间的相似性度量,可以有效地检测出有较少历史相似波段的异常波段。在此基础上我们提出了一种基于聚类索引的异常波段检测方法用以加速该检测过程。在木马数据集和僵尸网络数据集上的实验表明该方法能以损失一定检测准确度为代价,取得比直接基于DTW的算法更高的检测效率。
     2、基于小波概要的网络安全时序数据流区间差分skyline查询。在进行网络安全态势分析的过程中,需要从大量同构的网络安全时序数据里选择出某些具有特殊性的数据作为重点关注对象。基于量值度量,现有的区间skyline查询有时不能满足网络安全应用的需求,且可能存在“淹没”现象。为此本文提出区间差分skyline的概念,面向特定时间区间内的数据增长属性进行处理以弥补现有研究的不足。并利用小波参数的差分性质,在网络安全时序数据流处理的背景下,给出了在常用的小波概要上快速进行不同粒度区间差分skyline查询的算法。在多地区、多种类木马数据集上的实验验证了本文所提方法能够在一定程度上避免基于量值度量的区间skyline查询的不足,并且与直接使用部分逆小波变换的算法相比有较低的计算复杂度。
     3、多维网络安全时序数据相似子序列搜索。历史相似子序列可以为网络管理者提供决策参考依据,并可用于对网络安全时序数据的定性预测。为了能够更好地利用近期出现的数据,本文通过引入数据立方体模型将相似子序列搜索问题扩展到了多维场景。继而利用数据立方体相邻层次单元间的相关性对搜索算法进行了改进,从而提高了搜索效率。在多维木马数据集上的实验验证了本文所提方法能够在具有多维组织结构的网络安全时序数据中搜索出更多有价值的匹配结果,并且能在保证准确性的基础上提高搜索算法的效率。
     4、网络安全时序数据预测。预测问题长期以来备受关注,同时也是网络安全态势分析中的一项重要需求。网络安全时序数据变化复杂,影响因素众多,难以对其建立合适的预测模型,因此传统的预测方法对此类数据的预测准确性往往较差。本文从基于案例推理的思想出发,通过引入事件序列分析领域中频繁情节的概念和方法为网络安全时序数据预测问题提供了新的解决思路。在此基础上针对具有不同特征的数据给出了利用均值特征和趋势特征进行预测的具体方法。通过在木马数据集和僵尸网络数据集上与其它几种常用预测方法的比较实验表明,本文所提出的方法在预测网络安全时序数据时具有较高的预测准确性。
     综上所述,本文的工作针对大规模网络安全态势分析背景下的时序数据挖掘问题,围绕着该问题在实际应用中涉及的若干个关键技术进行研究。本文对于促进该问题的理论研究和实用化具有一定的理论和应用价值。
With the network security situation analysis, network administrators can understand the security situation of the large scale network, and get the assistance for decision making. Therefore, the related researches attract attention from the government and academia in recent years. In order to analyzing the security situation of the large scale network, plenty of data-adopting tools have been deployed in the backbone networks. Since the high performance is required by the large scale network, these tools are most designed in the dedicated way. Instead of association analysis commonly used in the normal scale networks, we may only extract the information of the data which are produced by these tools with statistical analysis, and the time series data formed by the statistics evolving over time can reflect the risk changes in the large scale network. Therefore, the large scale network security situation analysis deeply relies on the data mining over the network security time series data.
     Considering the requirements of the large scale network security situation analysis, we research on the mining over the network security time series data, and conduct the experiments on the Trojan data produced by“863-917”network security monitor platform and the botnet data produced by the honeynet. We extract four important problems and conduct an in-depth study in the aspects of finding the special changes and supporting the decision making in network security time series data mining. The main contents of this dissertation are organized as follows:
     1. Anomalous wave sections detection over pseudo period network security time series data. Pseudo period time series data appear in many large scale network security applications. The anomalous wave sections usually suggest the changes of the network security risk which are worth to do further analysis. Due to the instability of networks, we adopt dynamic time warping distance which has been suggested to be adaptable to data shift as similarity measurement of different wave sections in pseudo period data, and then detect the anomalous wave sections which have few historical similar counterparts based on that similarity measurement. A fast detection algorithm based on cluster index is proposed to speedup the detection process. Extensive experiments on the Trojan and botnet datasets show the efficiency of the proposed method is better than the algorithm which is directly based on DTW with the acceptable accuracy loss.
     2. Interval differential skyline query over network security time series data streams based on wavelet synopses. In the process of the large scale network security situation analysis, we need to select some special data which we can focus on from a large number of time series data. Based on the volume measurement, the current interval skyline query sometimes can not satisfy the network security applications requirements, and the“submerge”phenomenon may exist. So the concept of the interval differential skyline is proposed which focuses on the attribute of increasing rate of the data to fix the shortages of the former kind of interval skyline query. In the background of network security data streams processing, an efficient algorithm is proposed which implements the interval differential skyline query in different granularities based on the commonly used wavelet synopsis. Extensive experiments on multiple kinds of Trojan data in multiple areas show that the proposed method can fix the shortages of the existed research, and has high performance.
     3. Similar sub-sequences search over multi-dimensional network security time series data. Historical similar sub-sequences can be used for providing decision making support to network administrators, and they can also be used for predicting the future changes in a qualitive way. Due to we consider the recent data in a time window will be more interesting, to get more useful search results with extra valuable information in the time window, the similar sub-sequences search problem is extended to the multi-dimensional scenario in this dissertation by introducing data cube model. Moreover, by studying the correlation of the cells among the neighboring levels in the data cube, the efficiency of the search algorithm can be improved on the basis of keeping the accuracy of the search results. Extensive experiments on the multi-dimensional Trojan data demonstrate the proposed method can get more valuable search results and has high efficiency.
     4. Prediction for the network security time series data. Time series data prediction is a long-standing issue of great concern, which has important requirement in network security situation analysis. Since the network security time series data affected by plenty of issues have large random perturbation, it is hard to build a suitable prediction model. The accuracy of the classical prediction methods may be undesirable. In this dissertation, we adopt the idea of the CBR(Case Based Reasoning) and introduce the concepts and methods of frequent episodes in the domain of event sequence analysis, to provide a new idea to solve the problem of network security time series data prediction. Based on it, we propose two concrete algorithms with the mean value feature and the trend feature respectively to achieve the prediction tasks for different data types. Extensive experiments on the Trojan and botnet datasets demonstrate the high prediction accuracy of the proposed methods for the network security time series data.
     In summary, we focus on the time series data mining for the large scale network security situation analysis, and four key issues have been conducted based on it. These works have academic and practical value for advancing the theory and practicability of the above research.

引文

[1]中国互联网信息中心(CNNIC).第25次中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.cn/uploadfiles/pdf/2010/1/15/101600.pdf.2010.3.
    [2]国家计算机网络应急技术处理协调中心.2008年上半年网络安全工作报告[EB/OL]. http://www.cert.org.cn/UserFiles/File/CISR2008fh.pdf1.pdf. 2009.
    [3] http://www.dre4y.cn/xw/gd/200707/942.html.
    [4] JPCERT/CC. http://www.jpcert.or.jp/isdas/index-en.html.
    [5]北京信息化工作办公室.互联网安全事件应急处理及案例[EB/OL]. http://bjcert.bnii.gov.cn:5002/2j/zjsd/mj.jsp?unid=715.
    [6] Pentaho. Pentaho Analysis Services: Mondrian Project[EB/OL]. http://mondrian.pentaho.org.s.
    [7] Chandrasekaran S., Cooper O., Deshpande A., Franklin M. J., Hellerstein J. M., Hong W., Krishnamurthy S., Madden S. R., Reiss F., Shah M. A.. TelegraphCQ: Continuous Dataflow Processing[C].//Proceedings of the 2003 ACM SIGMOD international conference on Management of data, San Diego, California: ACM Press,2003: 668~680.
    [8] Endsley M. R.. Situation Awareness in Aviation Systems[C]// Garland D. J., Wise J. A., Hopkin V. D.(eds.). Handbook of Aviation Human Factors. Mahwah, NJ:Erlbaum, 1999: 257~276.
    [9] Kass S. J., Herschler D. A., Companion M. A.. Training Situational Awareness through Pattern Recognition in a Battlefield Environment[J].Military Psychology,1991,3(2): 105~112.
    [10] Mogford R. H.. Mental Models and Situation Awareness in Air Traffic Control[J]. The International Journal of Aviation Psychology,1997,7(4): 331~341.
    [11] Endsley M. R.. Design and Evaluation for Situation Awareness Enhancement[C].//Proceedings of the Human Factors Society 32nd Annual Meeting, Santa Monica, CA: Human Factors Society, 1988: 97~101.
    [12] Bass T., Gruber D.. a Glimpse into the Future of ID[EB/OL]. http://www.usenix.org/publications/login/1999.9/features/future.html.
    [13]刘密霞.网络安全态势分析与可生存性评估研究[D].兰州:兰州理工大学,2008.
    [14] Bass T.. Intrusion Systems and Multisensor Data Fusion: Creating Cyberspace Situation Awareness[J]. Communications of the ACM, 2000,43(4):99~105.
    [15] Shifflet J.. a Technique Independent Fusion Model for Network Intrusion Detection[C].//Proceedings of the Midstates Conference on UndergraduateResearch in Computer Science and Mathematics,2005,3(1):13~19.
    [16] Lau S.. The Spinning Cube of Potential Doom[J]. Communications of the ACM,2004,47(6):25~26.
    [17] Carnegie Mellon’s SEI.System for Internet Level Knowledge(SILK) [EB/OL]. http://silktools.sourceforge.net,2009.
    [18] Yurcik W.. Visualizing Netflows for Security at Line Speed: the SIFT Tool Suite[C].// Proceedings of the 19th conference on Large Installation System Administration Conference, San Diego, CA,USA: USENIX Association, 2005:16~16.
    [19]冯毅.《中国信息战》我军信息与网络安全的思考[EB/OL].http://www.laocanmou.net/html/20056194115.html,2005.
    [20]陈秀真,郑庆华,管晓宏,林晨光.层次化网络安全威胁态势量化评估方法.软件学报[J],2006,17(4):885~897.
    [21]北京理工大学信息安全与对抗技术研究中心.网络安全态势评估系统技术白皮书[EB/OL].http://www.thinkor.com/product/download/网络安全态势评估系统白皮书2.doc,2005.
    [22] Lai J. B., Wang H. Q.,Zhu L.. Study of Network Security Situation Awareness Model Based on Simple Additive Weight and Grey Theory[C].//Proceedings of the International Conference on Computational Intelligence and Security,Guangzhou, China: IEEE Computer Society,2006:1545~1548.
    [23] Hu W., Li J. H., Shi J. J.. a Novel Approach to Cyberspace Security Situation Based on the Vulnerabilities Analysis[C].//Proceedings of the 6th World Congress on Intelligent Control and Automation, Dalian,China: IEEE Computer Society,2006:4747~4751.
    [24] Antunes C., Oliveira A.. Temporal Data Mining: An overview[C].//KDD Workshop on Temporal Data Mining. San Francisco,USA: ACM Press, 2001: 1~13.
    [25] Lin L.. Management of 1-D Sequence Data-From Discrete to Continuous[D]. Linkoping University, Sweden, 1998, 3.
    [26]曲文龙.复杂时间序列知识发现模型与算法研究[D].北京:北京科技大学, 2006.
    [27] Ester M., Kriegel H. P., Sander J.. Knowledge Discovery in Spatial Database[M]. Institute for Computer Science,University of Munich, Oetingenstr, 2000: 35～38.
    [28] Koperski K., Han J.. Discovery of Spatial Association Rules in Geographic Information Databases[C].//Proceedings of the 4th International Symposium 0n Advances in Spatial Databases.Portland Maine: Springer-Verlag, 1995:47～66.
    [29] Zaiane O. R., Han J., Zhu H.. Mining Recurrent Items in Multimedia with Progressive Resolution Refinement[C].//Proceedings of the 16th International Conference on Data Engineering. Califonia, U.S.A: IEEE Computer Society, 2000: 461～470.
    [30] Han J., Dong G.,Yin Y.. Efficient Mining Partial Periodic Paterns in Time Series Database[C].//Proceedings of the 15th International Conference on Data Enginieering. Sydney, Australia: IEEE Computer Society, 1999: l06～115.
    [31] Keogh E. J., Pazzani M. J.. An Indexing Scheme for Fast Similarity Search in Large Time Series Databases[C].//Proceedings of 11th International Conference on Scientific and Statistical Database Management. Ohio, U.S.A.: IEEE Computer Society, 1999: 56～67.
    [32] Ghani R., Slattery S., Yang Y.. Hypertext Categorization Using Hyperlink Paterns and Meta Data.//Proceedings of 18th International Conference on Machine Learning. San Francisco, U.S.A.: Morgan Kaufmann, 2001: 178～185.
    [33] Joachims T.. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C].//Proceedings of 10th European Conference on Machine Learning. Chemnitz, Germany: Springer-Verlag, 1998: 137～142.
    [34] Sullivan T.. Reading reader reaction: A Proposal For Inferential Analysis of Web Server Log Files[C].//Proceedings of the 3rd Human Factors and the Web Conference. Colorado, U.S.A: ACM Press, 1997, 216～223.
    [35]肖辉.时间序列的相似性查询与异常检测[D].上海:复旦大学,2005.
    [36] Andreas S. W., Neil A. G.. Time Series Prediction: Forecasting the Future and Understanding the Past[C].//Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis. Santa Fe, New Mexico: Addison-Welsley, 1992.
    [37] Bozkaya T., Yazdani N., ?zsoyo?lu M.. Matching and Indexing Sequences of Different lengths[C].// Proceedings of the 6th International Conference on Information and Knowledge Management. Las Vegas, U.S.A.: ACM Press, 1997:128～135.
    [38]董晓莉.时间序列数据挖掘相似性度量和周期模式挖掘研究[D].天津:天津大学,2007.
    [39] Srivastava J., Cooley R., Deshpande M., Tan P.. Web Usage Mining: Discovery and Application of Usage Paterns from Web Data[J].// SIGKDD Explorations Newsletter l(2) : ACM Press, 2000: 12-23.
    [40] Tang L., Cui B., Li H., Miao G., Yang D., Zhou X.. Effective VariationManagement for Pseudo Periodical Streams[C].//Proceedings of the 2007 ACM SIGMOD international conference on Management of data, Beijing, China: ACM Press,2007: 257~268.
    [41] Ishiguro M., Suzuki H., Murase I., Ohno H.. Internet Threat Detection System Using Bayesian Estimation[C].//Proceedings of the16th Annual FIRST Conference on Computer Security Incident Handling. Budapest, Hungary: FIRST, 2004.
    [42] Takeuchi J., Yamanishi K.. A Unifying Framework for Detecting Outliers and Change Points from Time Series[J].IEEE Transanctions on Knowledge and Data Engineering, 2006,18(4):482~492.
    [43] Jagadish H. V., Nick K., Muthukrishnan S.. Mining Deviants in A Time Series Database[C].//Proceedings of the 25th International Conference on Very Large Data Bases(VLDB’99). Scotland,UK: Morgan Kaufmann, September, 1999: 7～10.
    [44] Kotsakis E., Wolski A.. MAPS: A Method for Identifying and Predicting Aberrant Behaviour in Time Series[C].//Proceedings of the 14th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems(IEA/AIE-2001). Budapest,Hungary: Springer-Verlag, June, 2001: 4～7.
    [45] Keogh E., Lonardi S., Chiu W.. Finding Surprising Paterns in A Time Series Database in Linear Time and Space[C].//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, U.S.A.: ACM Press, July, 2002: 550～556.
    [46] Li X., Han J.. Mining Approximate Top-K Subspace Anomalies in MultiDimensional TimeSeries Data[C].//Proceedings of the 33rd international conference on Very large data bases. Vienna, Austria: VLDB Endowment, September, 2007: 447~458.
    [47] Agrawal R., Faloutsos C., Swami A.. Efficient Similarity Search in Sequence Databases[C].//Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. London:Springer-Verlag, 1993: 69~84.
    [48] Chan K. P., Fu W. C.. Efficient Time Series Matching by Wavelets[C].//Proceedings of the 15th International Conference on Data Engineering. Washington: IEEE Computer Society, 1999: 126~133.
    [49] Goldin D. Q., Kanellakis P. C.. On Similarity Queries for Time-series Data: Constraint Specification and Implementation[C].//Proceedings of the 1st International Conference on the Principles and Practice of Constraint Programming. Cassis, France: Springer-Verlag,September, 1995: l37～l53.
    [50] Lin J., Keogh E., Lonardi S., Chiu B. Y.. A Symbolic Representation of TimeSeries, with Implications for Streaming Algorithrns[C].//Proceeding of the 8th SIGMOD Workshop on DMKD. San Diego,U.S.A.: ACM Press, 2003:2～11.
    [51] Agrawal R., Lin K. I., Sawhney H. S., Shim K.. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-series Databases[C].//Proceedings of the 21st International Conference on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 1995:490～501.
    [52] Aach J., Church G.. Aligning Gene Expression Time Series with Time Warping Algorithm[J].Bioinformatics. 2001, (17): 495～508.
    [53] Berndt D. J., Clifford J.. Using Dynamic Time Warping to Find Patterns in Time series[C]. //Proceedings of KDD-94: AAAI Workshop on Knowledge Discovery in Databases. Seattle, Washington: AAAI Press,1994: 359~370.
    [54] Athitsos V., Papapetrou P., Potamias M., Kollios G., Gunopulos D.. Approximate Embedding-based Subsequence Matching of Time Series[C].// Proceedings of the 2008 ACM SIGMOD international conference on Management of data. Vancouver, Canada: ACM Press, 2008: 365~378.
    [55]苏亮.数据流分析关键技术研究[D].长沙:国防科学技术大学,2008.
    [56]肖辉,胡运发.基于分段时间弯曲距离的时间序列挖掘[J].计算机研究与发展, 2005, 42(1):72~78.
    [57] Li C. S., Yu P. S., Castelli V.. HierarchyScan: Ahierarchical Similarity Search Algorithm for Databases of Long Sequences[C].//Proceedings of the 12th International Conference Data Engineering. New Orleans, Louisiana: IEEE Computer Society, 1996:546~546.
    [58] Yang J., Wang W., Yu P. S.. Mining Asynchronous Periodic Patterns in Time Series Data[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 613～628.
    [59] Elfeky M. G., Aref W. G.. Periodicity Detection in Time Series Databases[J]. IEEE transactions on Knowledge and Data Engineering, 2005, 17(7): 875～886.
    [60] Han J., Kamber M..数据挖掘:概念与技术(原书第二版)[M],北京:机械工业出版社,2007.3.
    [61]邹柏贤,刘强.基于ARMA模型的网络流量预测[J].计算机研究与发展,2002,39(12):1645～1652.
    [62] Vlachos M., Yu P. S., Castelli V.. On Periodicity Detection and Structural Periodic Similarity[C].//Proceedings of the 5th SIAM International Conference on Data Mining (SDM). Newport Beach, CA: SIAM Press,2005:449~460.
    [63] Vlachos M., Meek C., Vagena Z., Gunopulos D.. Identifying Similarities, Periodicities and Bursts for Online Search Queries[C].//Proceedings of the2004 ACM SIGMOD international conference on Management of data. Paris, France: ACM Press, 2004: 131~142.
    [64] Estan C., Savage S., Varghese G.. Automatically inferring patterns of resource consumption in network traffic[C].//Proceedings of ACM SIGCOMM Data Communications Festival. Karlsruhe, Germany: ACM Press, 2003:137~148.
    [65] Jiang B., Pei J.. Online Interval Skyline Queries on Time Series[C].//Proceedings of the 25th IEEE International Conference on Data Engineering. Shanghai, China: IEEE Computer Society, 2009: 1036~1047.
    [66] B?rzs?nyi S., Kossmann D., Stocker K.. The Skyline Operator[C].//Proceedings of the 17th International Conference on Data Engineering. Heidelberg, Germany : IEEE Computer Society,2001: 421~430.
    [67] Papadias D., Tao Y., Fu G., Seeger B.. An Optimal and Progressive Algorithm for Skyline Queries[C].//Proceedings of the 2003 ACM SIGMOD international conference on Management of data. San Diego, California: ACM Press, 2003: 467~478.
    [68] Babcock B., Babu S., Datar M., Motwani R., Widom J.. Models and Issues in Data Stream Systems[C].//Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. Madison, Wisconsin: ACM Press, 2002: 1~16.
    [69]金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8): 1172~1181.
    [70] Sakurai Y., Faloutsos C., Yamamuro M.. Stream Monitoring under the Time Warping Distance[C].//Proceedings of the 23rd International Conference on Data Engineering. Istanbul, Turkey: IEEE Computer Society, 2007: 1046~1055.
    [71] Muthukrishnan S., Shah R., Vitter J. S.. Mining Deviants in Time Series Data Streams[C].//Proceedings of the 16th International Conference on Scientific and Statistical Database Management. Piscataway, NJ, U.S.A.: IEEE Computer Society, 2004: 41~50.
    [72] Chan F. K., Fu A. W., Yu C.. Haar Wavelets for Efficient Similarity Search of Time-series: with and without Time Warping[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 686～705.
    [73] Matias Y., Vitter J. S., Wang M.. Wavelet-based Histograms for Selectivity Estimation[J]. SIGMOD Records. 27(2), 1998: 448~459.
    [74] Hung H., Chen M.. Efficient Range-constrained Similarity Search on Wavelet Synopses over Multiple Streams[C].//Proceedings of the 15th ACM international Conference on Information and Knowledge Management. Arlington, Virginia, U.S.A.: ACM Press,2006: 327~336.
    [75] Guha S., Harb B.. Wavelet Synopsis for Data streams: MinimizingNon-euclidean Error[C].//Proceedings of the 11th ACM SIGKDD international Conference on Knowledge Discovery in Data Mining, Chicago, Illinois, U.S.A.: ACM Press, 2005: 88~97.
    [76] Garofalakis M., Gibbons P. B.. Wavelet Synopses with Error Guarantees[C].//Proceedings of the 2002 ACM SIGMOD international Conference on Management of Data, Madison. Wisconsin: ACM Press, 2002: 476~487.
    [77]陈华辉,施伯乐.数据流上具有数据遗忘特性的小波概要[J].计算机研究与发展,2009,46(2):268~279.
    [78] Gilbert A. C., Kotidis Y., Muthukrishnan S., Strauss M. J.. One-pass Wavelet Decompositions of Data Streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3): 541~554.
    [79] She R., Chen F., Wang K., Ester M., Gardy J. L., Brinkman F. S. L.. Frequent-subsequence-based Prediction of Outer Membrane Proteins[C].// Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C.: ACM Press, 2003, 8: 436~445.
    [80] Garofalakis M., Kumar A.. Deterministic Wavelet Thresholding for Maximum-error Metrics[C].//Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Paris, France: ACM Press, 2004: 166~176.
    [81] Dagon D., Zou C., Lee W.. Modeling Botnet Propagation Using Time Zones[C].//Proceedings of the 13th Network and Distributed System Security Symposium. San Diego, California U.S.A.: ISOC, 2006.
    [82] Aggarwal C. C., Han J., Wang J., Yu P. S.. A Framework for Projected Clustering of High Dimensional Data Streams [C].//Proceedings of the 30st international Conference on Very Large Data Bases. Toronto, Canada: VLDB Endowment, 2004: 852~863.
    [83] Faloutsos C., Ranganathan M., Manolopoulos Y.. Fast Subsequence Matching in Time-series Databases[J].ACM SIGMOD Record, 23(2). New York, NY, USA: ACM Press, 1994: 419-429.
    [84] Agarwal S., Agrawal R., Deshpande P. M., Gupta A., Naughton J. F., Ramakrishnan R., Sarawagi S.. On the Computation of Multidimensional Aggregates[D].//Proceedings of the 30th international Conference on Very Large Data Bases. Bombay, India: Morgan Kaufmann, 1996: 506~521.
    [85] Cheng W., Xu X., Jia Y., Zou P.. Network Dynamic Risk Assessment Based on the Threat Stream Analysis[C].//Proceedings of the 9th international Conference on Web-Age information Management. Zhangjiajie,China: IEEEComputer Society, 2008: 532~538.
    [86] Han J., Chen Y., Dong G., Pei J., Wah B. W., Wang J., Cai Y. D.. Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams. Distrib[J]. Parallel Databases 18(2),2005: 173~197.
    [87] Chen Y., Dong G., Han J., Wah B. W., Wang J.. Multi-Dimensional Regression Analysis of Time-Series Data Strems[C].//Proceedings of the 28th international conference on Very Large Data Bases. Hong Kong, China: VLDB Endowment, 2002:323~334.
    [88] Chen B. C., Chen L., Lin Y., Ramakrishnan R.. Prediction Cubes[C].// Proceedings of the 31st international conference on Very large data bases. Trondheim, Norway: VLDB Endowment, 2005: 982~993.
    [89] Li X., Han J.. Mining Approximate Top-k Subspace Anomalies in Multi-dimensional Time-series Data[C].//Proceedings of the 33rd international conference on Very large data bases. Vienna, Austria: VLDB Endowment, 2007: 447~458.
    [90] Gonzalez H., Han J., Li X.. Flowcube: Constructing RFID Flowcubes for Multi-dimensional Analysis of Commodity Flows[C].//Proceedings of the 32nd international conference on Very large data bases. Seoul, Korea: VLDB Endowment, 2006:834~845.
    [91] Riesbeck C. K., Schank R. S.. Inside case-based reasoning[M]. Northvale, NJ:Lawrence Erlbaum Associates. 1989.
    [92]李捷,刘瑞新,刘先省,韩志杰.一种基于混合模型的实时网络流量预测算法[J].计算机研究与发展,2006,43(5): 806~812.
    [93]宣蕾.网络安全定量风险评估及预测技术研究[D].长沙:国防科学技术大学,2007.
    [94] Laxman S., Sastry P. S., Unnikrishnan K. P.. Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1505~1517.
    [95] Atallah M. J., Gwadera R., Szpankowski W.. Detection of Significant Sets of Episodes in Event Sequences[C].//Proceedings of the 4th IEEE International Conference on Data Mining. Brighton,UK: IEEE Computer Society, 2004: 3~10.
    [96] Mannila H., Toivonen H., Verkamo A. I.. Discovery of Frequent Episodes in Event Sequences[J]. Data Mining and Knowledge Discovery, 1997, 1(3): 259~289.
    [97] Laxman S., Sastry P. S., Unnikrishnan K. P.. A Fast Algorithm for Finding Frequent Episodes in Event Streams[C].//Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, San Jose,California,U.S.A.: ACM Press, 2007: 410~419.
    [98] Laxman S., Tankasali V., White R. W.. Stream Prediction Using A Generative Model Based on Frequent Episodes in Event Sequences[C].//Proceeding of the 14th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, U.S.A.: ACM Press, 2008: 453~461.
    [99] Han J., Pei J., Yin Y., Mao R.. Mining Frequent Patterns without Candidate Generation: A Frequent-pattern Tree Approach[J]. Data Mining and Knowledge Discovery, 2004, 8(1): 53~87.
    [100] Das G., Lin K., Mannila H., Renganathan G., Smyth P.. Rule Discovery from Time Series[C].//Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. NewYork, NY: AAAI Press, 1998: 16-22.
    [101] Victor A., Schafer R. W.. Digital Signal Processing[M].Prentice-Hall: Englewood Cliffs, N.J., 1975.
    [102] Wu Y. L., Agrawal D., Abbadi A. E.. A Comparison of DFT and DWT Based Similarity Search in Time-series Databases[C].//Procedings of the 9th International Conference on Infomation and Knowledge Management, Washington, U.S.A.: ACM Press, 2000:488～495.
    [103] Keogh E., Paszani M.. An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback[C].//Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. New York, U.S.A:AAAI Press,1998: 239～241.
    [104] Keogh E., Smyth P.. A Probabilistic Approach to Fast Pattem Matching in Time Series Databases[C].//Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. Menlo Park, U.S.A.:AAAI Press, 1997:20～24.
    [105] Keogh E., Chu S., Hart D., Pazzani M. J.. An Online Algorithm for Segmenting Time Series Data Mining[C].// Proceedings of the 2001 IEEE International Conference on Data Mining. Washinton,U.S.A.: IEEE Computer Society, 2001: 289～296.
    [106] Keogh E., Chahrabarti K., Pazzani M., Mehrotra S.. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases[C].//Proceedings of the 7th ACM SIGMOD International Conference on Management of Data, Philadelphia: ACM Press, 2001: 151～162.
    [107] Keogh E., Paszani M.. Scaling up Dynamic Time Warping for Data Mining Applications[C].//Proceeding of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston,U.S.A.: ACM Press, 2000: 285～289.
    [108] Lkhagva B., Suzuki Y., Kawagoe K.. Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation[C].// Proceedings of Data Engineering Workshop 4A-i8. Ginowan, Japan: IEICE, 2006: 1~6.
    [109] Huang Y. W., Yu P. S.. Adaptive Query Processing for Time-series Data[C].//Proceeding of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, U.S.A.: ACM Press, 1999: 282～286.
    [110]钟清流,蔡自兴.基于统计特征的时序数据符号化算法[J].计算机学报,2008,31(10):1857~1864.
    [111] Yoon J. P., Lee J., Kim S. R.. Trend Similarity and Prediction in Time-series Databases[C].//Proceedings of SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools and Technology II. Orlando, FL, U.S.A.: SPIE Press, 2000: 201~212.
    [112] Yoon J. P., Lou Y., Nam J.. A Bitmap Approach to Trend Clustering for Prediction in Time-series Databases[C].//Proceedings of SPIE Conference on Data Mining and Knowledge Discovery: Theory, Tools, and Technology III. FL, U.S.A.: SPIE Press, 2001: 302~312.
    [113]陈当阳,贾素玲,王惠文,罗昌.时态数据的趋势序列分析及其子序列匹配算法研究[J].计算机研究与发展, 2007,44(3):516~520.
    [114] Korn P., Sidiropoulos N., Faloutsos C., Siegel E., Protopapas Z.. Fast Nearest-neighbor Search in Medical Image Databases[C].//Procedings of 22th International Conference on very Large DataBases. Bombay, India: Morgan Kaufmann,1996: 215～226.
    [115] Keogh E., Chakrabani K., Pazzani M., Mehrotra S.. Dimensionality Reduction for fast similarity search in 1arge time series databases[J].Journal of Knowledge and Information Systems. London: Springer-Verlag, 200l, 3(3): 263～286.
    [116] Li B., Tan L. X., Zhang J. S., Zhuang Z. Q.. Using Fuzzy Neural Network Clustering Algorithm in the Symbolization of Time Series[C].//Proceedings of the 2000 IEEE Asia-Pacific Conference on Circuit and Systems. TianJin, China: IEEE Computer Society, 2000: 379～382.
    [117] Ge X., Smyth P.. Deformable Markov Model Templates for Time-Series Pattern Matching[C].//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, U.S.A.: ACM Press, 2000: 81～90.
    [118]卢山.基于非线性动力学的金融时间序列预测技术研究[D].南京:东南大学博士学位论文, 2006.
    [119] Cox D. R.. Regression Models and Life-tables[J]. Journal of the Royal Statistical Society. Series B (Methodological), 1972,34(2):187~220.
    [120] Qu Y., Wang C., Gao L., Wang X. S.. Supporting Movement Pattern Queries in User-Specified Scales[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(1): 26~42.
    [121]王东生,曹磊.混沌分形及其应用[M].合肥:中国科学技术大学出版社,1995.
    [122] Kohara K., Ishikawa T., Fukuhara Y., Nakamura Y.. Stock Price Prediction Using Prior Knowledge and Neural Networks[J].International Journal of Intelligent Systems in Accounting, Finance & Management,1997,6(1): 11~22.
    [123] Wang L., Teo K. K., Lin Z.. Predicting Time Series with Wavelet Packet Neural Networks[C].//Proceedings of the 2001 International Joint Conference on Neural Networks. Washington, DC, U.S.A.: IEEE Computer Society, 2001: 1593~1597.
    [124]洪飞,吴志美.基于小波的多尺度网络流量预测模型[J].计算机学报,2006,29(1):166~170.
    [125] Lian X., Chen L.. Efficient Similarity Search over Future Stream Time Series[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1): 40~54.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700