基于流形的半监督分类方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于流形的半监督分类方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Study on Manifold-based Semi-Supervised Classification
作者：赵凌潇
论文级别：博士
学科专业名称：计算数学
中文关键词：流形学习 ; 半监督学习 ; 分类 ; 降维 ; 核方法
英文关键词：Manifold Learning ; Semi-Supervised Learning ; Classification ; Dimension Reduction ; Kernel method
学位年度：2009
导师：张振跃
学科代码：070102
学位授予单位：浙江大学
论文提交日期：2009-04-01

摘要

半监督学习,特别是关于数据聚类的半监督学习方法,是机器学习领域近年来广受关注的研究方向。非线性流形降维和再生核空间是两个非常重要的研究内容。本文重点研究用于数据聚类的非线性降维方法和基于部分的属类异同信息下的核(kernel)学习方法,及其导出的聚类算法。我们的主要成果分成两部分:基于属类概率的数据降维和基于点对属类异同概率的kernel学习。这两部分互相关联,核学习方法也可用于数据降维。
     一、关于数据聚类的非线性降维方法。
     1.我们提出了基于属类概率预估的非线性降维方法PLLE。其主要特点是将属类概率向量用于距离函数的构造。这个距离函数,不同于通常的欧几里得距离或流形测地距离,它既保持了欧几里得距离的部分特性,也具有元素属类的特性。与原先提出的只适用于部分训练点集的方式相比,这一距离函数适用于整个训练集和测试集,因而具有整体性。PLLE结合了经典的(用于无监督问题的)非线性降维方法LLE的思想,更具有半监督分类的特点。它克服了一般流形学习算法在处理监督信息上的缺憾。
     2.PLLE算法的关键部分是属类概率向量的估计。我们进一步提出了预估属类概率向量的PE算法。它基于经典的逻辑回归(LD)思想。数值实验证明,PLLE与PE结合后得到的PLLEc算法是一个性能卓越的有监督分类算法。
     3.我们将属类概率预估的思想用于拉普拉斯特征映射(LE)方法,进行数据降维,提出了具有属类信息的半监督降维的PLE算法,这可用于数据聚类。PLE算法中所需的属类概率预估,可以采用前述的PE方法得到,也可以用我们提出的基于kernel学习的方法估计。
     二、基于部分属类异同信息的核(kernel)学习。
     1.对于具有部分属类异同信息的数据,现有许多算法是通过寻找最佳线性投影来完成降维任务的,这类方法的效果对于数据的分布非常敏感。针对这一问题,我们给出了一种创新性的分类可靠性函数以及概率向量的确定方式。它基于由点对约束传播(PCP)方法得到的kernel矩阵。我们将其用于PLE方法,提出了称为PCP-PLE的分类算法,及其改进了的结合维数类别数因素的PCP-PLE~*降维方法。这些算法由于包含了具有分类效果的隐式映射,因此,对于任何形式的数据分布均可有效完成保持属类异同信息的降维工作,实验表明,PCP-PLE~*要优于一些最新的基于同样背景的算法。
     2.点对约束传播的kernel学习算法PCP在应用中具有一定的局限性。我们详细研究了其特点,发现用由PCP得到的核矩阵作核形式的K-means聚类时,所得分类的规范共信度值并不随着已知的属类相同信息量的增加而改善。PCP更依赖于已知属类异同点对的分布。根据PCP的弱点,提出了一种具有点对之间属类异同的概率约束传播的kernel学习算法PPCP。在很多情形下,PPCP可能比PCP更加有效。更为重要的是:基于我们提出的属类异同可靠性估计方法,PPCP可以用于无任何先验的点对属类异同信息。因而可作为一种无监督的聚类算法,这更有利于实际应用。
     3.在可靠性函数的基础上,我们提出了一种主动的kernel学习算法:active-PCP和active-PPCP。该算法能够自适应地搜索对分类起消极作用的点对,并对其进行去除或者松弛约束的处理,进而提升分类效果。此外,我们最新研究的有关自动扩张约束集合以改进分类的工作也在文中进行了介绍和讨论。
     全文由六章组成。第一章为读者阐述了本文课题的研究背景、发展现状以及文章的主要科研成果。第二章简介流形学习和核方法领域的经典工作。第三章主要描述了PLLE,PE,PLLEc,PLE算法。第四章详细介绍了PCP、K-means方法以及聚类有效性指标NMI,给出了PCP-PLE,PCP-KPCA和PCP-PLE~*算法。第五章提出了PPCP,PCP(PPCP)-Kmeans,active-PCP(PPCP)以及扩张的PCP(PPCP)算法。第六章总结了全文的工作,并对后续的研究课题加以展望。
Semi-supervised learning, especially semi-supervised methods for clustering, is a popular issue in the machine learning field. Nonlinear dimensionality reduction based on manifold learning and Reproduce Kernel Hilbert Space are all important topics in this field. This thesis concerns about nonlinear dimensionality reduction for data classification, pair-wise constraints-based kernel learning and also the method of clustering. There are two pieces of work we address: dimensionality reduction based on class-probability and kernel learning based on probabilistic pairwise constraints. These two parts are interpenetrable, and kernel learning method can also be used for dimensionality reduction.
     1. Nonlinear dimensionality reduction for data classification.
     (a). We propose a novel nonlinear dimensionality reduction algorithm based on class-probability. The algorithm called PLLE designs a distance metric from the class-probability. Different from Euclidean distance or manifold geodesic distance, this metric keeps not only the speciality of Euclidean distance but also the character of elements' class labels. Thus, this metric is much more suitable for the whole data set and testing set than those fitted to the training set only. PLLE absorbs the idea of classical nonlinear dimensionality reduction method LLE, and has its own speciality of semi-supervised classification. It perfectly deals with the difficulty that introducing supervised information into manifold learning methods.
     (b). The prediction of class-probability is the key part of PLLE. The thesis proposes a PE algorithm to evaluate the class-probability vector. It comes from the idea of Logistic Discriminant method. PLLEc, which is a combination of PLLE and PE, is shown as an effective supervised classifier through numerical experiments.
     (c). We propose a semi-supervised dimensionality reduction method called PLE. PLE is a class-probability algorithm based on Laplacian Eigenmap by adopting the idea of class-probability prediction. This method can be used for data clustering. The class-probability can be obtained by either PE or the prediction based on kernel learning methods.
     2. Kernel Learning based on pairwise constraints.
     (a). Many dimensionality reduction algorithms seek to find the best linear projection for data sets with pairwise constraints, but these methods are highly sensitive to the distribution of data. To address this problem, a novel method is proposed to determine a probability function or class-probability vector for classification. It is based on the kernel matrix from PCP. We combine it with PLE, and get classification method called PCP-PLE. Taking the number of dimensions and classes into acount, an update version called PCP-PLE* is raised. There is an implicit mapping designed for classification in these two algorithms. Thus they can achieve a better dimensionality reduction goal for data under various distributions. The numerical results show that PCP-PLE* performs better, compared with some of the latest algorithms on the same scenario.
     (b). Pairwise Constraint Propagation (PCP), which we analyze in detail, has some limitations as a kernel learning method. We find that sometimes the Normalized Mutual Information may not improve consistently with the increasing constraints, when kernel K-means is used for clustering. This means the performance of PCP is quite relevant to the distribution of pairwise constraints. So we raise a PPCP model which relaxes the pairwise constraints by probabilistic constraints. PPCP may be more effective than PCP in some cases. Further more, based on the probability function designed above, PPCP can be used without any prior pairwise constraint knowledge. Therefore, it can be applied in more cases as an unsupervised clustering algorithm.
     (c). Based on the probability function, we propose a type of active kernel learning algorithms: active-PCP and active-PPCP. The algorithms can adaptive search the negative pairwise constrains for classification, and then deal with them by elimination or relaxation, which can improve the performance of classification. Apart from these, we propose an approach to enlarge the set of pairwise constraints automatically which makes the algorithm more suitable for classification.
     There are six chapters in this thesis. Chapter 1 states the basic concepts, background knowledge and development of the issue, and also lists the production of the thesis. Chapter 2 gives a brief review on the work of manifold learning and kernel methods. The main algorithms are illustrated in Chapters 3,4, and 5. Conclusion remarks and future perspectives are shown in the last chapter.

引文

[1] MITCHELL T, BUCHANAN B, DEJONG G, et al. Machine leaming[J]. Annual Review of Computer Science, 1990, 4(1):417-433.
    [2] SAUL L, ROWEIS S. Nonlinear dimensionality reduction by locally linear embed-ding[J]. Science, 2000, 290:2323-2326.
    [3] J. TENENBAUM V D S, LANGFORD J. A global geometric framework for nonlinear dimension reduction[J]. Science, 2000, 290:2319-2323.
    [4] DONOHO D, GRIMES C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data[J]. Proceedings of the National Academy of Sciences, 2003, 100(10):5591-5596.
    [5] BELKIN M, NIYOGI P. Laplacian eigenmaps for dimensionality reduction and data representation[J]. Neural computation, 2003, 15(6): 1373—1396.
    [6] ZHANG Z, ZHA H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment[J]. SIAM J. Sci. Comput., 2005, 26:313-338.
    [7] HE X, NIYOGI P. Locality preserving projections[C]// Advances in Neural Information Processing Systems 16. Bradford Book. 2004: 153.
    [8] DE RIDDER D, KOUROPTEVA O, OKUN O, et al. Supervised locally linear em-bedding[J]. Lecture Notes in Computer Science, 2003, 333-341.
    [9] CRISTIANINI N, SHAWE-TAYLOR J, ELISSEEFF A, et al. On kernel-target align-ment[C]// Advances in Neural Information Processing Systems, vol.14. MIT Press. 2002: 367.
    [10] KONDOR R, LAFFERTY J. Diffusion kernels on graphs and other discrete input spaces[C]// Machine Learning-International Workshop. 2002: 315-322.
    [11] LANCKRIET G, CRISTIANINI N, BARTLETT P, et al. Learning the kernel matrix with semidefinite programming[J]. The Journal of Machine Learning Research, 2004, 5:27-72.
    [12] CHAPELLE O, WESTON J, SCHOLKOPF B. Cluster kernels for semi-supervised learning[C]// 7th Gttingen Meeting of the German Neuroscience Society. vol 7. 2006: 1.
    [13] ZHU X, KANDOLA J, GHAHRAMANI Z, et al. Nonparametric transforms of graph kernels for semi-supervised learning[J]. Advances in neural information processing systems, 2005, 1641-1648.
    [14] HOI S. LYU M, CHANG E. Learning the unified kernel machines for classifica-tion[C]//Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM New York, NY, USA. 2006: 187-196.
    [15] KULIS B, SUSTIK M, DHILLON I. Learning low-rank kernel matrices[C]// Proceedings of the 23rd international conference on Machine learning. ACM New York, NY, USA. 2006:505-512.
    [16] WAGSTAFF K, CARDIE C. Clustering with instance-level constraints[C]// Proceedings of the National Conference on Artificail Intelligence. 2000: 1097-1097.
    [17] KLEIN D, KAMVAR S, MANNING C. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering[C]// Machine Learning-International Workshop. 2002: 307-314.
    [18] KULIS B, BASU S, DHILLON I, et al. Semi-supervised graph clustering: a kernel approach[J]. Machine Learning, 2009, 74(1): 1-22.
    [19] VAPNIK V, LERNER A. Pattern recognition using generalized portrait method[J]. Automation and Remote Control, 1963, 24(6):774-780.
    [20] VAPNIK V. The Nature of Statistical Learning Theory[M]. 2nd ed. New York: Springer, 2000.
    [21] SCHOLKOPF B, SMOLA A, MULLER K. Kernel principal component analysis[J]. Lecture notes in computer science, 1997, 1327:583-588.
    [22] ZHU X, GHAHRAMANI Z, LAFFERTY J. Semi-supervised learning using Gaussian fields and harmonic functions[C]// Machine Learning-International Workshop. vol 20. 2003:912.
    [23] LI Z, LIU J, TANG X. Pairwise constraint propagation by semidefinite programming for semi-supervised classification[C]// Proceedings of the 25th international conference on Machine learning. ACM New York, NY, USA. 2008: 576-583.
    [24] STURM J. Using SeDuMi 1. 02, a MATLAB toolbox for optimization over symmetric cones[J]. Optimization methods and software, 1999, 11(1):625—653.
    [25] BAR-HILLEL A, HERTZ T, SHENTAL N, et al. Learning a mahalanobis metric from equivalence constraints[J]. Journal of Machine Learning Research, 2006, 6(1):937.
    [26] ZHANG D, ZHOU Z, CHEN S. Semi-supervised dimensionality reduction[C]// Proceedings of the 7th SIAM International Conference on Data Mining. 2007.
    [27] HARTIGAN J, WONG M. A k-means clustering algorithm[J]. JR Stat. Soc. Ser. C-Appl. Stat, 1979,28:100-108.
    [28] ALSABTI K, RANKA S, SINGH V. An efficient k-means clustering algorithm[C]// 1st Workshop on High-Performance Data Mining. vol 10. 1998.
    [29] WAGSTA K, CARDIE C, SCHROEDL S. Constrained k-means clustering with background knowledge[C]// Proc. 18th Int. Conf. on Machine Learning. 2001.
    [30] ESTER M, KRIEGEL H, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]// Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. 1996: 226-231.
    [31] WEN J, NIE J, ZHANG H. Clustering user queries of a search engine[J].
    [32] MOON J, MOSER L. On cliques in graphs[J]. Israel Journal of Mathematics, 1965, 3(1):23-28.
    [33] HASTAD J. Clique is hard to approximate within n[C]// Foundations of Computer Science. 1996: 627-636.

    [34] LIVNY T. BIRCH: an efficient data clustering method for very large databases[J].

    [35] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: A new data clustering algorithm and its applications[J]. Data Mining and Knowledge Discovery, 1997, 1(2):141-182.

    [36] BERGER M, GOSTIAUX B, LEVY S. Differential geometry: manifolds, curves, and surfaces[M]. Springer-Verlag New York, 1988.

    [37] 陈省身．微分几何讲义[M]．北京大学出版社,1983．

    [38] KRUSKAL J. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis[J]. Psychometrika, 1964, 29(1):1—27.

    [39] CHUNG F. Spectral graph theory[M]. American Mathematical Society, 1997.

    [40] SHAWE-TAYLOR J, CRISTIANINI N. Kernel methods for pattern analysis[M]. Cambridge University Press, 2004.

    [41] JOLLIFFE I. Principal component analysis[M]. Springer New York, 2002.

    [42] GOLUB G, VAN LOAN C. Matrix computations[M]. 3rd ed. Baltimore, Maryland: Johns Hopkins University Press, 1996.

    [43] FEINBERG S. The analysis of CrossclAssfied Categorical Data[M]. 2nd ed. Cambridge, MA: MIT Press, 1985.

    [44] T.S. FUREY N D D B, N. CRISTIANINI. Support vector machine clasification and validation of cancer tissue samples using microarray expressiondata[J]. Bioinformat-ics, 2000, 16:906-914.

    [45] VAPNIKV. Statistical Learning Theory[M]. Wiley, 1998.

    [46] ARNOLDI W. The principle of minimized iterations in the solution of the matrix eigenvalue problem[J]. Q. Appl. Math, 1951, 9(17): 17-29.

    [47] FRANCIS J. The QR Transformation: A unitary analogue to the LR Transformation, Parts 1 and 2[J]. Comp. J, 1961, 4:265-272.
    [48] GOLUB T, SLONIM D, TAMAYO P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439):531-537.
    [49] VAN'T VEER L, DAI H, VAN DE VIJVER M, et al. Gene expression profiling predicts clinical outcome of breast cancer.[J]. Nature, 2002, 415(6871):530-536.
    [50] GORDON G, JENSEN R, HSIAO L, et al. Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma 1[J]. Cancer Research, 2002, 62(17):4963-4967.
    [51] ALON U, BARKAI N, NOTTERMAN D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays[J]. Cell Biology, 1999, 96(12):6745-6750.
    [52] POMEROY S, TAMAYO P, GAASENBEEK M, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression[J]. Nature, 2002, 415(6870):436-442.
    [53] HASTIE T, TIBSHIRANI R, FRIEDMAN J, et al. The elements of statistical learn-ing[M]. Springer New York, 2001.
    [54] MIKA S, RATSCH G, WESTON J, et al. Fisher discriminant analysis with ker-nels[C]// Proceedings of the 1999 IEEE Signal Processing Society Workshop. 1999: 41-48.
    [55] STREHL A, GHOSH J. Cluster ensembles- a knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3:583-617.
    [56] STREHL A, GHOSH J, MOONEY R. Impact of similarity measures on web-page clustering[C]// Proc. AAAI Workshop on AI for Web Search. 2000: 58-64.
    [57] T. HASTIE E S, P. SIMARD. Learing prototype models for tangent distance[C]// Advances in Neural Information Processing Systems (NIPS), vol.7. Cambridge, MA: MIT press, 1995:999-1006.
    [58] T.S. FUREY N D D B, N. CRISTIANINI. Classification of gene-expression data: The manifold-based metric learning way[J]. Patten Recognition, 2006, 39:2450-2463.
    [59] P. SIMARD J D B V, Y. LECUN. Transformation invariance in pattern recgnition tangent distance and tangent propagation[J]. Int. J.Imaging System Technol., 2001, 11:181-194.
    [60] BAR-HILLEL A, HERTZ T, SHENTAL N, et al. Learning distance functions using equivalence relations[C]// MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE. vol 20. 2003: 11.
    [61] BASU S, BILENKO M, MOONEY R. A probabilistic framework for semi-supervised clustering[C]// ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM New York, USA. 2004: 59-68.
    [62] BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples[J]. The Journal of Machine Learning Research, 2006, 7:2399-2434.
    [63] BILENKO M, BASU S, MOONEY R. Integrating constraints and metric learning in semi-supervised clustering[C]// International Conference on Machine Learning. ACM New York, USA. 2004.
    [64] BOYD S, VANDENBERGHE L. Convex optimization[M]. Cambridge university press, 2004.
    [65] CHAPELLE O, SCHOLKOPF B, ZIEN A. Semi-supervised learning[M]. MIT press, 2006.
    [66] CHAPELLE O, ZIEN A. Semi-supervised classification by low density separation[J]. Encyclopedia of Biostatistics, 2005, 34:77-86.
    [67] GLOBERSON A, ROWEIS S. Metric learning by collapsing classes[J]. Advances in Neural Information Processing Systems, 2006, 18:451.
    [68] GOLDBERG A, ZHU X, WRIGHT S. Dissimilarity in graph-based semi-supervised classification[J]. Artificial Intelligence and Statistics (AISTATS), 2007.
    [69] HOI S, JIN R, LYU M. Learning nonparametric kernel matrices from pairwise con-straints[C]// International Conference on Machine Learning. ACM New York,USA. 2007:361-368.
    [70] KAMVAR S, KLEIN D, MANNING C. Spectral learning[C]// International Joint Conference on Artificial Intelligence. 2003: 561-566.
    [71] LI Z, LIU J, CHEN S, et al. Noise robust spectral clustering[C]// IEEE 11th International Conference on Computer Vision. 2007: 1-8.
    [72] SCHOLKOPF B, SMOLA A. Learning with kernels: Support vector machines, reg-ularization, optimization, and beyond[M]. MIT press, 2002.
    [73] SMOLA A, KONDOR R. Kernels and regularization on graphs[J]. Lecture Notes in Computer Science, 2003, 144-158.
    [74] SZUMMER M, JAAKKOLA T. Partially labeled classification with Markov random walks[C]// Advances in Neural Information Processing Systems. MIT Press. 2002: 945.
    [75] TONG W, JIN R. Semi-supervised learning by mixed label propagation[C]// Proced-ings of the National Conference on Artificial Intelligence. vol 22. 2007: 651.
    [76] XING E, NG A, JORDAN M, et al. Distance metric learning with application to clustering with side-information[J]. Advances in Neural Information Processing Systems, 2003,521-528.
    [77] ZHANG T, ANDO R. Analysis of spectral kernel design based semi-supervised learn-ing[J]. Advances in Neural Information Processing Systems, 2006, 18:1601.
    [78] ZHOU D, BOUSQUET O, LAL T, et al. Learning with local and global consis-tency[C]// Advances in Neural Information Processing Systems. Bradford Book. 2004:321.
    [79] ZHU X. Semi-supervised learning literature survey[J].

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700