高伸缩性聚类分析方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

高伸缩性聚类分析方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on High Scalable Clustering Analysis Method
作者：刘晨
论文级别：博士
学科专业名称：信号与信息处理
中文关键词：伸缩性聚类 ; 数据切分与划分 ; 均值径向压缩 ; 最小距离谱
英文关键词：scalable clustering ; data division and partition ; means radial compression ; the minimum distance spectrum
学位年度：2013
导师：卢志茂
学科代码：081002
学位授予单位：哈尔滨工程大学
论文提交日期：2013-03-12

摘要

聚类问题一直是模式识别领域的热点课题，其应用领域广泛，包括统计学、图像处理、医疗诊断、信息检索、生物学以及机器学习等。近年来，很多聚类方法纷纷涌现。这些方法大多受到自身算法的伸缩性限制，在特定数据规模的数据集上表现出优良的性能，但在超出其规定规模的数据集上往往收效甚微，甚至无法运行。随着信息采集与存储技术的飞速发展，数据的多样性越发突出，因此，对于高伸缩性的聚类方法的探索越发成为关注的焦点。
     本文主要针对聚类算法的伸缩性以及一些聚类算法存在的高昂计算复杂度和巨大内存需求而难以应用于大规模数据集的处理中的问题展开研究和讨论。在此过程中，本文的主要创新体现在以下几方面：
     （1）许多经典的聚类算法在小数据规模的数据聚类任务中取得了非常优秀的效果，但由于其伸缩能力不强，使得大多数算法在大规模数据的聚类任务中很难胜任或无法完成聚类分析的任务。针对探索高伸缩性聚类方法的问题，使得算法能够适应大幅度的数据规模变化，本文以化整为零的处理思想为基础，对于数据集先进行切分后划分的处理方式进行了深入的研究。提出一种基于这种处理方式的聚类方法—基于数据切分与划分的聚类方法。该方法处理数据不须将数据一次读入主存，可以大幅度的降低了算法对硬件资源的需求，相比于传统迭代产生的质心不易陷入局部最优。
     （2）DP是一种伸缩性较强的聚类方法，在小数据集合和大数据集合的聚类任务中都表现出了优异的聚类性能，但对于数据规模过大的情况下，其局部特征样本集过大，超出主存要求，仍然存在不足。针对这种情况，本文对于DP理论进行深入分析后提出了逐级压缩的思想，并对DP方法进行了改进，提出了一种基于均值径向压缩的聚类方法（Means Radial Compression，MRC），相比于DP方法，均值径向压缩方法MRC具有更好的伸缩性能，并且其优良的时间复杂度O(n)也使得其应用范围更广。
     （3）提出一种基于最小距离谱的数据特征聚类特性的可视化分析方法（MinDS）。通常情况下，用于参与聚类分析的数据是经过数据表示后产生的数据特征，应具有内在的联系，使其呈现出分组特性，聚类分析则指按照某种相似性测度找出这种数据分组。因此，数据表示过程、数据特征的选择将直接影响最终聚类结果。MinDS首先定义了最小距离谱模型，通过对最小距离谱特征分析，可以将多维数据间数据关系映射到二维数据空间中，对于直观的评价数据特征聚类特性，聚类方法失效原因等都获得了很好的效果。同时MinDS方法也可用于处理噪声，识别孤立点，寻求数据类别等方面。
The clustering problem has been a hot topic in the field of pattern recognition, whichhas wide range of applications, including statistics, image processing, medical diagnostics,information retrieval, biology, machine learning and so on. In recent years, many clusteringmethods have emerged. Most of these methods are limitated by the scalability of itselfalgorithm, and show excellent performance on the specific data-scale data sets, but oftenhave little effect on the outside of its scale data sets, and even can not run. With the rapiddevelopment of information collection and storage technologies, the diversity of the data ismore prominent, so the exploration for highly scalable clustering method becomes more andmore popular.
     The article study and discuss on the clustering algorithm scalability, and the problemthat the clustering algorithms is difficultly applied to the processing of large data sets due tothe high computational complexity and huge memory requirements of the clusteringalgorithm. In this process, the main innovation is reflected in the following aspects:
     (1) Many classic citrus algorithm get very good results in the small data size of dataclustering task, but the algorithm is not strong because of its scalability, which make themost algorithms is difficult to competent or can not be completed in large-scale dataclustering task. To explore the clustering method for high scalability problem and make themethod adapt to the wide range of data set, this thesis based on the thought of piecesprocessing further researches the data set processing way of first segmentation then partition.And a clustering method based on this approach, clustering method based on the datasegmentation and partition, is proposed in this thesis. The proposed method does not need toread all the data into main memory at the same time and result in the greatly reduceddemand for hardware resource. It is uneasy for the method to fall into local optimumscompared with the traditional method of generating centers iteratively.
     (2) DP is a strong elasticity clustering method, and have shown excellent clusteringperformance in the clustering task of the large data collection and the small data collection.But the DP method still has the limitation on its application, because when the data scale istoo large and the local characteristics of a sample set is too large to exceed the requirements of the main memory. For this situation, DP which is a kind of clustering method of strongscalability can perform well in clustering small and large data sets. However, DP can notwork well in very large data set because the local characteristics sample set of the data is toolarge resulting in the out of main memory requirements. To solve the shortage, the thesisdesigns the thought of compression step by step after deep analyzing the theory of DP andproposes an improved DP, clustering method based on the Means Radial Compression,MRC. The experimental results show that means radial compression algorithm can makebetter solutions compared with DP with time complexity of O(n).
     (3) The method of the visual analysis based on the minimum distance spectrum datafeature clustering characteristics is proposed. Usually, after the data representation the dataused to participate in the clustering analysis will generate the data characteristics. Thecharacteristics of the data should have an inherent connection, so make the data present thepacket characteristics, and the cluster analysis is to identify this data packet in accordancewith some similarity measure. Therefore, the process of the data represents and the choiceof the data characteristics will directly affect the final clustering result. MinDS first definethe minimum distance spectrum model, and the cube data can be mapped to thetwo-dimensional data space by the minimum distance spectrum characteristics analysis. Sofor the intuitive evaluation of the features of data characteristics clustering, the failurereasons of clustering method get very good results. At the same time, Minds method canalso be used to deal with the noise, to identify outlier and to seek data categories.

引文

[1]贺玲,蔡益朝,杨征.高维数据聚类方法综述.计算机应用研究.2010,27(1):23-28.
    [2]刘铭,王晓龙,刘远超.一种大规模高维数据快速聚类算法.自动化学报,2009,35(7):859-866.
    [3]史卫亚,郭跃飞.大规模数据集下谱聚类算法的求解.计算机科学,2012,39(6A):312-315.
    [4]江小平,李成华,向文,张新访,颜海涛. k-means聚类算法的MapReduce并行化实现.华中科技大学学报,2011,39(增刊I):120-124.
    [5]王守强,朱大铭,史士英. k-means聚类问题的改进近似算法.山东大学学报,2011,41(4):125-132.
    [6]武森,魏桂英,白尘,张桂琼.分类属性高维数据基于集合差异度的聚类算法.北京科技大学学报,2010,32(8):1085-1089.
    [7] Zeng H, Cheung Y M. Feature selection and kernel learning for local learning-basedclustering. IEEE transactions on pattern analysis and machine intelligence.2011,33(8):1532-1547.
    [8] Chen W Y, Song Y Q, Bai H J, Lin C J, Chang E Y. Parallel spectral clustering indistributed systems. IEEE transactions on pattern analysis and machine intelligence.2011,33(3):568-586.
    [9] Spielman D A, Srivastava N. Graph sparsification by effective resistances. SIAM J.Comput.,2011,40(6):1913-1926.
    [10] Talwalkar A, Kumar S, Rowley H. Large-scale manifold learning. IEEE Conference onComputer Vision and Pattern Recognition. Anchorage, AK: IEEE,2008:1-8
    [11] Ramirez I, Sprechmann P, Sapiro G. Classification and clustering via dictionarylearning with structured incoherence and shared features. IEEE Conference onComputer Vision and Pattern Recognition. San Francisco, CA: IEEE,2010:3501-3508
    [12] Streib K, Davis J W. Using Ripley’s K-function to improve graph-based clusteringtechniques, IEEE Conference on Computer Vision and Pattern Recognition.Providence, RI: IEEE,2011:2305-2312.
    [13] Pham D S, Budhaditya S, Phung D, Venkatesh S. Improved subspace clustering viaexploitation of spatial constraints. IEEE Conference on Computer Vision and PatternRecognition. Providence, RI: IEEE,2012:550-557.
    [14] Li M, Kwok J T, Lu B L. Making large-scale Nystr m approximation possible.Proceedings International Conference on Machine Learning, Haifa, Israel,2010:631-638
    [15] Ma X L, Wan W G, Jiao L C. Spectral clustering ensemble for image segmentation.Proceedings of the first ACM/SIGEVO Summit on Genetic and EvolutionaryComputation. New York: ACM,2009:415-420.
    [16] Zhang K, Kwok J T, Parvin B. Prototype vector machine for large scalesemi-supervised learning. Proceedings of the26th Annual International Conference onMachine Learning. New York: ACM,2009:1233-1240.
    [17] Kumar S, Mohri M, Talwalkar A. Sampling techniques for the Nystr m method. InProc. Int. Conf. Artif. Intell. Statist., Clearwater Beach, FL,2009:304-311.
    [18] Zhang K, Tsang I W, Kwok J T. Improved Nystr m low-rank approximation and erroranalysis. Proceedings of the25th international conference on Machine learning. NewYork: ACM,2008:1232-1239.
    [19]李霞,蒋盛益,张倩生,朱靖.适用于大规模文本处理的动态密度聚类算法.北京大学学报,2012,49(1):1-7.
    [20]夏英,李克非,丰江帆.基于网格梯度的多密度聚类算法.计算机应用研究,2008,25(11):3278-3291.
    [21]朱君,曲超,汤庸.利用单词超团的二分图文本聚类算法.电子科技大学学报,2008,37(3):439-442.
    [22]王世卿,张真,陈本华.一种基于概率统计的自适应网格聚类算法.微电子学与计算机,2008,5(5):173-178.
    [23]韩旭东,夏士雄,刘兵,周勇.一种基于核的快速可能性聚类算法.计算机工程与应用,2011,47(6):176-180.
    [24]印桂生,于翔,宁慧.一种基于网格的增量聚类算法.计算机应用研究,2009,26(6):2038-2040.
    [25]何虎翼,姚莉秀,沈红斌,杨杰.一种新的子空间聚类算法.上海交通大学学报,2007,41(5):813-817.
    [26] Iwata K, Hayashi A. A redundancy-based measure of dissimilarity among probabilitydistributions for hierarchical clustering criteria. IEEE transactions on pattern analysisand machine intelligence. New York: IEEE,2008,30(1):76-88.
    [27] Goh A, Vidal R. Clustering and dimensionality reduction on riemannian manifolds.IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE,2008:1-7.
    [28]王慎超,苗夺谦,陈敏,王睿智.基于覆盖的粗糙聚类算法.电子与信息学报,2008,30(7):44-47.
    [29]张战成,王士同,邓赵红, Chung Fu-lai.支持向量机的一种快速分类算法.电子与信息学报,2011,33(9):2181-2186.
    [30]姚丽娟,罗可,孟颖.一种基于粒子群的聚类算法.计算机工程与应用,2012,48(13):150-154.
    [31]孙胜,王元珍.基于核的自适应K-Medoid聚类.计算机工程与设计,2009,30(3):674-676.
    [32]张震,汪斌强,伊鹏,兰巨龙.一种分层组合的半监督近邻传播聚类算法.电子与信息学报,2013,35(3):645-651.
    [33] Bai L, Liang J Y, Dang C Y, Cao F Y. The impact of cluster representatives on theconvergence of the K-modes type clustering. IEEE transactions on pattern analysis andmachine intelligence,2013,35(6):1509-1522.
    [34]周林,平西建,徐森,张涛.基于谱聚类的聚类集成算法.自动化学报,2012,38(8):1335-1342.
    [35]黄少滨,李建,刘刚.一种基于自适应最近邻的聚类融合方法.计算机工程与应用,2012,48(19):157-162.
    [36]邱保志,陈本华,张真.一种新的快速混合聚类算法.微电子学与计算机,2008,25(7):78-80.
    [37]樊晓平,盛荣芬,廖志芳,刘丽敏.一种选择性加权聚类融合算法.计算机工程与应用,2012,48(22):195-200.
    [38] Ayad H G, Kamel M S. Cumulative voting consensus method for partitions with avariable number of clusters. IEEE transactions on pattern analysis and machineintelligence,2008,30(1):160-173.
    [39] Carpineto C, Romano G. Consensus clustering based on a new probabilistic rand indexwith application to subtopic retrieval. IEEE transactions on pattern analysis andmachine intelligence,2012,34(12):2315-2326.
    [40] Kumar S, Mohri M, Talwalkar A. Ensemble Nystr m method. In Advances in NeuralInformation Processing Systems22. Cambridge, MA: MIT Press.2009:1060-1068.
    [41]刘兵,夏士雄,周勇,韩旭东.基于样本加权的可能性模糊聚类算法.电子学报,2012,40(2):371-375.
    [42]张燕丽,刘晓东. AFS理论的模糊聚类.计算机工程与应用,2009,45(9):8-12.
    [43]章森,朱美玲,侯光奎.改进的模糊核聚类算法.北京工业大学学报,2012,38(9):1408-1411.
    [44]李方,佟为明,李凤阁,王铁成.基于模糊核c-means算法的位置指纹聚类.控制与决策,2012,27(8):1180-1190.
    [45]张霞,尹怡欣,于海燕,赵海龙.基于模糊粒度计算的文本聚类研究.计算机工程与应用,2010:46(13):53-55.
    [46]李丹,顾宏,张立勇.基于属性权重区间监督的模糊C均值聚类算法.控制与决策,2010,25(3):457-465.
    [47]汪勇,金菲,张瑞军.引导函数支配的进化模糊聚类算法.系统工程理论与实践,2011,31(2):302-307.
    [48]贺正洪,雷英杰.直觉模糊C-均值聚类算法研究.控制与决策,2011,26(6):847-856.
    [49]张雷,李人厚.基于免疫原理的一种动态数据聚类方法.控制与决策,2007,22(4):469-472.
    [50] Marttinen P, Tang J, Baets B D, Dawyndt P, Corander J. Bayesian clustering of fuzzyfeature vectors using a quasi-likelihood approach. IEEE transactions on patternanalysis and machine intelligence,2009,31(1):74-85.
    [51]史卫亚,郭跃飞,薛向阳.一种解决大规模数据集问题的核主成分分析算法.软件学报,2009,20(8):2153-2159.
    [52]钱鹏江,王士同,邓赵红.快速核密度估计定理和大规模图论松弛聚类方法.自动化学报,2011,37(12):1423-1434.
    [53]钱鹏江,王士同,邓赵红,徐华.基于最小包含球的大数据集快速谱聚类算法.电子学报,2010,38(9):2035-2041.
    [54]钱鹏江,王士同,邓赵红.基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法.自动化学报,2011,37(2):179-187.
    [55]钱鹏江,王士同,邓赵红.大数据集快速均值漂移谱聚类算法.控制与决策,2010,25(9):1307-1312.
    [56]赵凤,焦李成,刘汉强,公茂果.半监督谱聚类特征向量选择算法.模式识别与人工智能,2011,24(1):48-56.
    [57]徐森,卢志茂,顾国昌.基于矩阵谱分析的文本聚类集成算法.模式识别与人工智能,2009,22(5):780-786.
    [58]雷小峰,谢昆青,林帆,夏征义.一种基于k-means局部最优行的高效聚类算法.软件学报,2008,19(7):1683-1692.
    [59]黄鹏飞,张道强.拉普拉斯加权聚类算法.电子学报,2008,36(12A):50-54.
    [60]牛琨,张舒博,陈俊亮.采用属性聚类的高维子空间聚类算法.北京邮电大学学报,2007,30(3)1-5.
    [61] Huang H C, Chuang Y Y, Chen C S. Affinity aggregation for spectral clustering. IEEEConference on Computer Vision and Pattern Recognition. Providence, RI: IEEE,2012:773-780.
    [62] Li M, Lian X C, Kwok J T, Lu B L. Time and space efficient spectral clustering viacolumn sampling. IEEE Conference on Computer Vision and Pattern Recognition.Providence, RI: IEEE,2011:2297-2304.
    [63] Chen X L, Cai D. Large scale spectral clustering with landmark-based representation.In AAAI.2011.
    [64] Mavroeidis D. Accelerating spectral clustering with partial supervision. Data Min.Knowl. Discov,2010,21(2):241-258.
    [65] Wang L, Leckie C, Ramamohanarao K, Bezdek J. Approximate spectral clustering. InProceedings of the13th Pacific-Asia Conference on Advances in KnowledgeDiscovery and Data Mining. Berlin, Heidelberg,2009:134-146.
    [66] Yan D H, Huang L, Jordan M I. Fast approximate spectral clustering. In Proceedings ofthe15th ACM SIGKDD international conference on Knowledge discovery and datamining. New York: ACM,2009:907-916.
    [67] Matthew B. Christoph B, Lampert H. Correlational spectral clustering. IEEEConference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE,2008:1-8.
    [68] Wang X Z, Wang L, Wirth A. Pattern discovery in motion time series viastructure-based spectral clustering. Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition. Anchorage, AK: IEEE,2009:277-284.
    [69] Lu Z D, Miguel á C P. Constrained spectral clustering through affinity propagation.IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE,2008:1-8.
    [70] Li Z G, Liu J Z, Tang X O. Constrained clustering via spectral regularization. IEEEConference on Computer Vision and Pattern Recognition. Miami, FL: IEEE,2009:421-428.
    [71] Zhang K, Kwok J T. Clustered Nystr m method for large scale manifold learning anddimension reduction. IEEE Transactions on Neural Networks,21(10):1576-1587.
    [72] Nasihatkon B, Hartley R. Graph connectivity in sparse subspace clustering. Providence,IEEE Conference on Computer Vision and Pattern Recognition. RI: IEEE,2011:2137-2144.
    [73] Sheng L Y, Ortega A. A novel adaptive Nystr m approximation. IEEE InternationalWorkshop on Machine Learning for Signal Processing. Santander: IEEE,2012:1-6.
    [74] Zhang K, Kwok J T. Density-weighted Nystr m method for computing large kerneleigen systems. Neural Comput.2009,21(1):121-146.
    [75] Kumar S, Mohri M, Talwalkar A. On sampling-based approximate spectraldecomposition. In Proc. Int. Conf. Mach. Learn., Montreal, QC, Canada.2009:585-592.
    [76] García-García D, Hernández E P, María F D. A new distance measure for model-basedsequence clustering. IEEE transactions on pattern analysis and machine intelligence.2009,31(7):1325-1331.
    [77] Santos J M, Sá J M D, Alexandre L A. LEGClust—A clustering algorithm based onlayered entropic subgraphs. IEEE transactions on pattern analysis and machineintelligence.2008,30(1):62-75.
    [78] Cho M S, Lee K M. Authority-Shift Clustering: Hierarchical clustering by authorityseeking on graphs. IEEE Conference on Computer Vision and Pattern Recognition. SanFrancisco, CA: IEEE,2010:3193-3200.
    [79] Graves D, Pedrycz W. Kernel-based fuzzy clustering and fuzzy clustering: AComparative Experimental Study. Fuzzy Sets and Systems,2010,161(4):522-543.
    [80] Bagirov A M. Modified global k-means algorithm for minimum sum-of-squaresclustering problems. Pattern Recognition,2008,41(10):3192-3199.
    [81] Lai Z C, Huang T J. Fast global k-means clustering using cluster membership andinequality. Pattern Recognition,2010,43(5):1954-1963.
    [82] Bagirov A, Ugon J, Webb D. Fast modified global k-means algorithm for incrementalcluster construction. Pattern Recognition,2011,44(4):866-876.
    [83]王晶,夏鲁宁,荆继武.一种基于密度最大值的聚类算法.中国科学院研究生院学报,2009,26(4):539-548.
    [84] He Z S, Cichocki A, Xie S L, Choi K. Detecting the number of clusters in n-wayprobabilistic clustering. IEEE transactions on pattern analysis and machine intelligence.2010,32(11):2006-2021.
    [85] Wang J, Wang J D, Ke Q F, Zeng G, Li S P. Fast approximate k-means via clusterclosures. IEEE Conference on Computer Vision and Pattern Recognition. Providence,RI: IEEE,2012:3037-3044.
    [86] Zhu C H, Wen F, Sun J. A rank-order distance based clustering algorithm for facetagging. IEEE Conference on Computer Vision and Pattern Recognition. Providence,RI: IEEE,2011:481-488.
    [87]边肇祺，张学工.模式识别第二版.清华大学出版社,2000.
    [88]杨淑莹.模式识别与智能计算-matlab实现.电子工业出版社,北京,2008.
    [89]毛国君,段丽娟,王实,石云,数据挖掘原理与算法,清华大学出版社,2007.
    [90] Bradley P S, Fayyad U M. Refining initial points for k-means clustering. Proceedingsof the Fifteenth International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann Publisher.1998:91-99.
    [91] Dhillon I S, Guan Y, Kogan J. Iterative clustering of high dimensional text dataaugmented by local search. Proceedings of the2002IEEE International Conference onData Mining, Washington DC: IEEE Computer Society,2002.
    [92] Hagen L, Kahng A. New spectral methods for ratio cut partitioning and clustering.IEEE Transactions on Computer-Aided Design,1992,11(9):1074-1085.
    [93] Luxburg U V. A tutorial on spectral clustering. Technical Report, TR-149,2006.
    [94] Stoer M, Wagner F. A simple min-cut algorithm. Journal of the ACM,1997,44(4):585-591.
    [95] Wu Z, Leahy R. An optimal graph theoretic approach to data clustering: theory and itsapplication to image segmentation. IEEE Transactions on Pattem Analysis andMachine Intelligence,1993,15(11):1101-1113.
    [96] Shi J B, Malik J, Normalized Cuts and Image Segmentation. IEEE Transactions onPattern Analysis and Machine Intelligence,2000,22(8):888-905.
    [97] Sarkar S, Soundararajan P. Supervised learning of large perceptual organization: graphspectral partitioning and learning automata. IEEE Transactions on Pattern Analysis andMachine Intelligence,2000,22(5):504-525.
    [98] Ding C, He X, Zha H, Gu M, Simon H D. A min-max cut algorithm for graphartitioning and data clustering. In: Proceedings of the IEEE ICDM,2001,107-114.
    [99] Cadez I V, Gaffney S, Smyth P. A general probabilistic framework for clusteringindividuals and objects. Proceedings of the sixth ACM SIGKDD internationalconference. New York: ACM.2000:140-149.
    [100] Meila M, Heckerman D. An experimental comparison of model-based clusteringmethods. Machine Learning.2001,42(1-2):9-29.
    [101] Zhong S, Ghosh J. A unified framework for model-based clustering and its applicationsto clustering time sequences. Journal of Machine Learning Research.2003,4:2001-2037.
    [102] Law M, Figueiredo M, Jain A. Simultaneous feature selection and clustering usingmixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence.2004,26(9):1154-1166.
    [103] Ester M, Kriegel H P, Sander J, XU X. A density-based algorithm for discoveringclusters in large spatial databases with noise. Proceedings of the2nd ACM SIGKDD.Portland, Oregon.1996:226-231.
    [104] Kohonen T, Kaski S, Lagus K,Salojrvi J, Honkela J, Paatero V, Saarela A. Selforganization of a massive document collection. IEEE Transactions Neural Networks,2000,11(3):574-585.
    [105] Dorigo M, Bonabeaub E, Theraulaz G. Ant algorithms and stigmergy. FutureGeneration Computer Systems.2000,16:851-871.
    [106] Eberhart Russell, Kennedy James. A new optimizer using particle swarm theory.Proceedings of the Sixth International Symposium on Micro Machine and HumanScience, Nagoya, Japan.1995:36-43.
    [107]张向荣,谭山,焦李成.基于商空间粒度计算的SAR图像分类.计算机学报,2007,30(3):483-490.
    [108] Wang, K. J. Supplement for affinity propagation. Available:http://www.mathworks.com/matlabcentral/fileexchange/authors/24811,2011.9.1
    [109] Frey, B.J., Dueck, D. Clustering by passing message between data points. Science.2007,315:972-976.
    [110] UCI Machine Learning Repositpory. Available: http://archive.ics.uci.edu/ml/,2011.12.5.
    [111] Witten, L.H., Frank, E., Hall, M. A. Data Ming: Practical Machine Learning Tools andTechniques.3rd ed.
    [112] Galas, D.J., Nykter, M., Carter, G.W. et al. Biological Information as Set-BasedComplexity. IEEE Transactions on Information Theory.2010,56(2):667-677.
    [113] Zeng, J., Liu, Z.Q. Type-2Fuzzy Markov Random Fields and Their Application toHandwritten Chinese Character Recognition. IEEE Transactions on fuzzy Systems.2008,16(3):747-760.
    [114] Kanhangad, V., Kumar, A., Zhang, D. Contactless and Pose Invariant BiometricIdentification Using Hand Surface. IEEE Transactions on Image Processing.2011,20(5):1415-1424.
    [115] Ch'ng, E. Realistic Placement of Plants for Virtual Environments. Computer Graphicsand Applications.2011,31(4):66-77.
    [116] Se Hun Kim, Mukhopadhyay, S., Wolf, M., Modeling and Analysis of ImageDependence and Its Implications for Energy Savings in Error Tolerant ImageProcessing IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems,2011,30(8):1163-1172.
    [117] Han J. and Kamber M., Data Mining: Concepts and Techniques, USA: MorganKauffman,2011.
    [118] Yue S. H., Wang J. S., Tao G. et al. An unsupervised grid-based approach for clusteringanalysis. Science in China Series F: Information Science.2010,53(7):1345-1357.
    [119] Elmqvist N. Hierarchical aggregation for information visualization: overview,techniques and design guidelines. IEEE Transactions on Visualization and ComputerGraphics.2010,16(3):439-454.
    [120] Cui W. W., Zhou H., Qu H. M. et al. Geometry-based edge clustering for graphvisualization. IEEE Transactions on Visualization and Computer Graphics.2008,14(6):1277-1284.
    [121] Tasdemir K. Exploiting data topology in visualization and clustering of self-organizingmaps. IEEE Transactions on Neural Networks.2009,20(4):549-562.
    [122] Gupta G. Automated hierarchical density shaving: a robust automated clustering andvisualization framework for large biological data sets. IEEE/ACM Transactions onComputational Biology and Bioinformatics.2010,7(2):223-237.
    [123] Linsen L. Surface extraction from multi-field particle volume data usingmultidimensional cluster visualization. IEEE Transactions on Visualization andComputer Graphics.2008,14(6):1483-1490.
    [124] Somerville J., Stuart L. and Sernagor E., iRaster: A novel information visualization toolto explore spatiotemporal patterns in multiple spike trains. Journal of NeuroscienceMethods.2010,194(1):158-171.
    [125] Jolliffe I. Principal Component Analysis. Germany: Springer.2005.
    [126] Fukunaga K., Introduction to Statistical Pattern Recognition, New York: AcademicPress,1990.
    [127] Kruskal J. B. and Wish M. Multidimensional Scaling (Quantitative Applications in theSocial Sciences). California: SAGE Publications.1978.
    [128] Cvek U., Trutschl M., Kilgore P. C. et al. Multidimensional Visualization Techniquesfor Microarray Data. the15th International Conference on Information Visualization.2011:241-246.
    [129] Kohonen T., Self-Organizing Maps, Germany: Springer-Verlag.2001.
    [130] Choo J., Bohn S., and Park H., Two-stage framework for visualization of clusteredhigh dimensional data, Proc. IEEE Symp. Visual Analytics Science and Technology(VAST),2009:67-74.
    [131] Chen Y., Wang L., Dong M. et al. Exemplar-based visualization of large documentcorpus. IEEE Transactions on Visualization and Computer Graphics.2009,15:1161-1168.
    [132] Agrawal R., Gehrke J., Gunopulos D. et al. Automatic subspace clustering of highdimensional data. Data Mining Knowledge Discovery.2005,11(1):5-33.
    [133] Aggarwal C. C., Wolf J. L., Yu, P. S. et al. Fast algorithms for projected clustering.Proc. ACM SIGMOD Conf. Management of Data.1999.
    [134] LeBlanc I., Ward M. O. and Wittels N. Exploring n-dimensional databases. Proc.Visualization '90, San Francisco, Calif.1990:230-237.
    [135] Shneiderman B., Tree Visualization with Treemaps: A2D space-filling approach, ACMTrans. Graphics,1992,11(1):92-99.
    [136] Beshers C. and Feiner S. K., Visualizing n-dimensional virtual worlds with n-vision,ACM SIGGRAPH Computer Graphics,1990,24(2):37-38.
    [137] Tufte E. R. The Visual Display of Quantitative Information. Cheshire, Conn.: GraphicsPress.1983.
    [138] Chernoff H. The use of faces to represent points in k-dimensional space graphically.7Am Statistical Assoc.1973,68:361-368.
    [139] Pickett R. M. and Grinstem G. G. Iconographic displays for visualizingmultidimensional data. Proc IEEE Conf Systems, Manand Cybernetzcs, IEEE Press,Piscataway, N. J.1988:514-519.
    [140] Pickett R. M. Visual Analyses of Texture in the Detection and Recognition of Objects.Gicture Processing and Psycho-Pzctorics B S. Liplciii and A Rosenfeld, eds, New YorkAcademic Press.1970.
    [141] Grinstein G., Sieg J. C., Smith S. et al. Visualization for Knowledge Discovery.Technical Report, Computer Science Dept, Univ of Massachusetts, Lowell.1991.
    [142] Keim D. and Kriegel H. VisDB: database exploration using multidimensionalvisualization. IEEE Computer Graphics and Applications.2002,14(5):40-49.
    [143] Yang J., Hubball D., Ward M. et al. Value and relation display: interactive visualexploration of large data sets with hundreds of dimensions. IEEE Transactions onVisualization and Computer Graphics.2007,13(3):494-507.
    [144] Nan C., David G., Sun J.M. et al. DICON: interactive visual analysis ofmultidimensional clusters. IEEE Transactions on Visualization and Computer Graphics.2011,17(12):2581-2590.
    [145] UCI Machine Learning Repository, Available: http://archive.ics.uci.edu/ml/,2012.91.
    [146] Lu Z. M., Zhang Q., Clustering by data competition. Science China: InformationScience. Published online, DOI:10.1007/s11432-012-4627-2.
    [147] Witten L. H., Frank E. and Hall M. A. Data Ming: Practical Machine Learning Toolsand Techniques. USA: Morgan Kaufmann.2011.
    [148] Jiang D. X., Tang C. and Zhang A. D., Clustering analysis for gene expression data: Asurvey. IEEE Transactions on Knowledge and Data Engineering.2004,16:1370-1386.
    [149] Dudoit S. and Fridlyand J. A prediction-based resampling method for estimating thenumber of clusters in a dataset. Genome Biology.2002,3:1-21.
    [150] Yue S. H., Wei M. M., Wang J. S., A general grid-clustering approach. PatternRecognition Letters.2008,29(9):1372-1384.
    [151] Frey B. J., Dueck D. Clustering by passing message between data points. Science.2007,315:972-976.
    [152] MacQueen J. B. Some methods for classification and analysis of multivariateobservations. The5th Berkeley Symposium on Mathematical Statistics and Probability,Berkeley.1967:281-297.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700