用户名: 密码: 验证码:
核矩阵低秩分解与核空间信息能度量研究及应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
模式分类中的核方法在实际应用中获得了较为成功的应用,其良好的性能在于不仅能高效地对数据间存在的非线性关系进行模式分析,而且核方法本身建立在严格的统计分析基础之上,与线性统计分析方法一样有坚实的理论基础。但是核方法在处理大规模数据分类任务时存在如下问题:一方面,核方法的计算复杂度较高,其算法的设计和求解与训练样本个数有关,且常见的经典算法采用凸二次优化策略,对于大规模数据集,需要较高的时间和空间复杂度;另一方面,由于核空间为高维甚至无限维,其间样本具有多模式、多态性,因此相似性不便于描述。针对上述问题,本文主要探讨了核矩阵的低秩分解和核空间的相似性测度。
     一是从特征选择和矩阵分解的角度考虑如何学习较优的低秩近似核矩阵。二是从基于距离的度量来考虑核空间中高维数据的特性。在此基础上,结合已有算法分别进行了比较分析,用基于核矩阵低秩分解与信息能度量的核方法实现高维多模式对象的特征提取和模式分类,实验结果验证了算法的有效性。
     总的来说,本文的主要工作包括如下五个方面:
     1.针对核矩阵分解算法时间复杂度较高的问题,研究了如何对核矩阵进行低秩分解。常用的矩阵低秩分解算法均可以视为无监督算法,本文通过分析核矩阵中行/列与类别的相关性,结合已有的矩阵分解运算,提出了有监督的核矩阵低秩分解方法,最后给出核矩阵低秩近似误差界的期望值。实验证明核矩阵分解过程中,行/列的选取对分类效果有较大影响,在保证分类性能的前提下,本文算法能在一定程度上提高核机器学习效率,为大规模数据集中的应用奠定了良好的基础;
     2.核方法在低维数据中已经取得了较为成功的应用,然而在高维数据中,由于数据包含更为丰富的内在结构,因此常用的相似性测度如欧氏距离面临分类效果较低的困境。通过研究非距离的度量问题,提出了新的信息能度量,该方法满足距离的度量公理,且不仅适用于低维数据,同时可以有效挖掘高维数据中的相似性结构,实验结果验证了该相似性测度的正确性;
     3.研究了核空间中的特征提取问题。基于提出的信息能度量,结合梯度上升方法,提出了新的特征提取算法。该算法在应用于大规模数据集时,可以采用核矩阵的低秩近似分解来有效降低运算复杂度,且无须预先进行特征选择。同时信息能的梯度信息可以描述特征的聚集趋势,有利于指导分类任务;
     4.基于提出的信息能度量,通过研究已有的最近邻算法和其核化形式,对其进行改进,提出了新的核k-最近邻算法。该算法有效结合了近邻法和核方法的特性,有较好的物理意义,且能由此模型推出经典的互信息度量,因此具有良好的理论基础和推广能力;
     5.对已有的细胞表型图形态差异学习算法进行核化,提出新的基于信息能度量的核形态差异学习算法,并设计了一系列最优参数选择方案,保证了实验效果,为核方法在该领域的进一步应用奠定了基础。
Kernel methods in pattern classification have achieved satisfactory results in practical application. Its high performance depends not only on their ability to analysis nonlinear relationships between data sets, but also strict statistical analysis which has a solid theoretical basis. However, the kernel machine will encounter many problems when deal with large-scale data sets. On the one hand, the kernel method has high computational complexity, because designing and solving is relate to the number of training samples and convex quadratic optimization strategies. On the other hand, the samples in high-dimensional or infinite-dimensional kernel space have multi modes and polymorphism, so it is difficult for similarity measure. As mentioned above, this thesis has focused on the theory and application for kernel machines.
     Firstly, we study the problem of learning low-rank approximation kernel matrices from the point of feature selection and matrix decomposition. Secondly, we consider the characteristics of high-dimensional data sets in kernel space using metric based on distance measure. Finally, we perform feature extraction and pattern classification based on low-rank kernel matrix and informative energy metric, experimental results verify the effectiveness of our proposed algorithms.
     All in all, the main contributions of this thesis are as follows:
     1. We study low-rank decomposition of kernel matrix for its high computational complexity. The traditional low-rank matrix decompositions can be regarded as unsupervised ones, we proposed supervised low-rank kernel matrix decomposition based on the correlation between row/column and class label. Then, the expectation of low-rank approximation error bound is given. Experimental results show that different selection of row/column has great impact on classification accuracy, and our algorithms are useful for large-scale data sets.
     2. Kernel methods have obtained successful application for low-dimensional data sets, but are unsuitable for high-dimensional data sets. The traditional similarity measure, such as Euclidean distance, faces lower classification performance because of the richer inner structure. We proposed a novel metric named as informative energy metric, which can solve the non-distance metric problem. The characteristic of our metric is that it meets with the distance metric axioms. So, our metric is not only suitable for low-dimensional data sets, but also high-dimensional data sets. Experimental results show the validity of our similarity measure.
     3. Feature extraction in kernel space is considered. We proposed a new feature extraction algorithm based on the informative energy metric, and been optimized using gradient ascent algorithm. The resultant low-rank kernel matrix is adopted for large-scale data sets. The main characteristics of our algorithm are that it can be used as feature selection, and the gradient information of the informative energy metric can describe the aggregate situation of features which is useful for classification tasks.
     4. We proposed a novel kernel k-nearest neighbor algorithm based on the informative energy metric. The main characteristics of the algorithm are that the mutual information metric can be derived from our model and it has solid theoretical basis and better generalization ability.
     5. The morphological difference algorithms based on kernel methods for cell image are considered. We proposed a new morphological difference learning algorithm based on the informative energy metric, and designed a series of parameter optimization strategy. Experimental results show that our algorithm is validity, and it conducted a solid foundation for further application of kernel methods in this field.
引文
[1] Duda R O, Hart P E, Stork D G. Pattern Classification[M]. 2nd ed. Beijing: China Machine Press, 2004.
    [2] Bishop C M. Pattern Recognition and Machine Learning[M]. Springer, 2006.
    [3] Bishop C M. Neural Networks for Pattern Recognition[M]. Oxford University Press, 1995.
    [4]边肇祺,张学工.模式识别[M].第2版.北京:清华大学出版社, 2000.
    [5]孙即详.现代模式识别[M].长沙:国防科技大学出版社, 2002.
    [6]杨光正,吴岷,张晓莉.模式识别[M].北京:中国科学技术大学出版社, 2001.
    [7]肖健华.智能模式识别方法[M].广州:华南理工大学出版社, 2006.
    [8] John S T, Nello C. Kernel Methods for Pattern Analysis[M]. Cambridge University Press, 2004.
    [9] Theodoridis S, Koutroumbas K. Pattern Recognition[M]. Academic Press, 2009.
    [10] Sch?lkopf B, Smola A J. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyong[M]. MIT Press, Cambridge, MA, 2002.
    [11] Vapnik V. Statistical Learning Theory[M]. New York: Wiley, 1998.
    [12] Vapnik V. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag, 1995.
    [13] Herbrich R. Learning Kernel Classifiers: Theory and Algorithms[M]. MIT Press, Cambridge, MA, 2001.
    [14] Jain A K, Duin R P W, Mao J C. Statistical Pattern Reconginition: A Review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(1): 4-37.
    [15] Perlovsky L I. Conundrum of Combinatorial Complexity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(6): 666-670.
    [16] Zadeh A. Fuzzy Set[J]. Information and Control, 1965, 8: 338-353.
    [17] Rosenblatt F. Principles of Neurodynamics: Perceptrons and The Theory of Brain Mechanisms[M]. Spartan Books, Washington, D.C., 1962.
    [18] Rosenblatt F. The Perceptron: A Probabilistic Model for Information Storage and Optimization in the Brain[J]. Cornell Aeronautical Laboratory, PsychologicalReview, 1958, 65(6): 386-408.
    [19] Hopfield J J. Neural Networks and Physical System with Emergent Collective Computation Abilities[C]. Proceedings of the National Academy of Sciences of the USA, April 1982, 79(8): 2554-2558.
    [20] Rumelhart D E, Hinton G E, Williams R J. Learning internal Representation by Error Propagation[M]. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, MA, 1986, 1: 318-362.
    [21] Mercer J. Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations[J]. Philosophical Transactions of The Royal Society of London, 1909, 209: 415-446.
    [22] Aronszajn N. Theory of Reproducing Kernels[J]. Transactions of the American Mathematical Society, 1950, 686: 337-404.
    [23] Parzen E. Extraction and Detection Problems and Reproducing Kernel Hilbert Spaces[J]. Journal of the Society for Industrial and Applied Mathematics, Series A, On Control, 1962, 1: 35-62.
    [24] Aizerman M A, Braverman E M, Rozonoer L I. Theoretical Foundation of Potential Function Method in Pattern Recognition Learning[J]. Automation and Remote Control, 1964, 25: 821-837.
    [25] Scholkopf B. Support Vector Learning[D]. Munich: R. Oldenbourg Verlag, 1997.
    [26] Haussler D. Convolution Kernels on Discrete Structures[D]. SantaCurz: Department of Computer Science, University of California, 1999.
    [27] Pearson K. On Lines and Planes of Closet Fit to Systems of Points in Space[J]. Philosophical Magazine, 1901, 2(6): 559-572.
    [28] Sch?lkopf B, Smola A, Muller K R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem[J]. Neural Computation, 1998, 10(5): 1299-1319.
    [29] Mika S, Sch?lkopf, Smola A, Muller K R, Scholz M, Ratsch G. Kernel PCA and De-Noising in Feature Space[C]. Advances in Neural Information Processing Systems, 1999, 10: 536-524.
    [30] Baudat G, Anouar F. Generalized Discriminant Analysis Using a Kernel Approach[J]. Neural Computation, 2000, 12(10): 2385-2404.
    [31] Lai P L, Fyfe C. Kernel and Nonlinear Canonical Correlation Analysis[J]. International Journal of Neural Systems, 2000, 10(5): 365-377.
    [32] Sch?lkopf B, Williamson R C, Smola A J et al. Support Vector Method for Novelty Detection[C]. In Advances in Neural Information Processing Systems, MIT Press, 2000, 12: 582-588.
    [33] Campbell C, Bennett K P. A Linear Programming Approach to Novelty Detection[C]. In Advances in Neural Information Processing Systems, MIT Press, 2001, 14: 395-401.
    [34] Smola A J, Sch?lkopf B. A Tutorial on Support Vector Regression[J]. Statistics and Computing, 2004, 14(3): 199-222.
    [35] Scholkopf B, Smola A J, Williamson R C, et al. New Support Vector Algorithm[J]. Neural Computation, 2000, 12(12): 1207-1245.
    [36] Asa B H, Horn D, Siegelmann H T, Vapnik V. Support Vector Clustering[J]. Journal of Machine Learning Research, 2001, 2: 125-137.
    [37] Baudat G, Anouar F. Feature Vector Selection and Projection Using Kernels[J]. Neurocomputing, 2003, 55(1-2): 21-38.
    [38] Niijima S, Kuhara S. Gene Subset Selection in Kernel-induced Feature Space[J]. Pattern Recognition Letters, 2006, 27(16): 1884-1892.
    [39] Louw N, Steel S J. Variable Selection in Kernel Fisher Discriminant Analysis by Means of Recursive Feature Elimination[J]. Computational Statistics and Data Analysis, 2006, 51(3): 2043-2055.
    [40] Yang S, Yan S C, Zhang C. Tang X O. Bilinear Analysis for Kernel Selection and Nonlinear Feature Extraction[J]. IEEE Transactions on Neural Networks, 2007,18(5): 1442-1452.
    [41] Lafferty J D, Lebanon G. Diffusion Kernel on Statistical Manifolds[J]. Journal of Machine Learning Research, 2005, 6: 129-163.
    [42] Spira A, Kimmel R, Sochen N A. A Short-time Beltrami Kernel for Smoothing Images and Manifolds[J]. IEEE Transactions on Image Processing, 2007, 16(6): 1628-1636.
    [43] Sonnenburg S, Ratsch G, Schafer C. A General and Efficient Multiple Kernel Learning Algorithm[C]. In Advances in Neural Information Processing Systems, 2006.
    [44] Zien A, Ong C S. Multiclass Multiple Kernel Learning[C]. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007, 1191-1198.
    [45] Rakotomamonjy A, Bach F, Canu S, Grandvalet Y. More Efficiency in Multiple Kernel Learning[C]. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007, 775–782.
    [46] Sonnenburg S, Ratsch G, Schafer C, Scholkopf B. Large Scale Multiple Kernel Learning[J]. Journal of Machine Learning Research, 2006, 7:1531–1565.
    [47] Alpaydin E, Kaynak C. Cascading Classifiers[J]. Kybernetika, 1998, 34: 369-374.
    [48] Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]. In Claire Nedellec and Celine Rouveirol, editors, Proceedings of the European Conference on Machine Learning, Berlin, Springer, 1998, 137-142.
    [49] Joachims T. Transductive Inference for Text Classification Using Support Vector Machines[C]. In International Conference on Machine Learning, Bled, Slovenia, 1999.
    [50] Tong S, Koller D. Support Vector Machine Active Learning with Applications to Text Classification[C]. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
    [51]李晓黎,刘继敏,史忠植.基于支持向量机与无监督聚类相结合的中文网页分类器[J].计算机学报, 2001, 24(1): 62-68.
    [52] Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. Text Classification Using String Kernels[J]. Journal of Machine Learning Research, 2001, 2: 419-444.
    [53] Dai G, Yeung D Y, Qian Y T. Face Recognition Using a Kernel Fractional-Step Discriminant Analysis Algorithm[J]. Pattern Recognition, 2007, 40(1): 229-243.
    [54] Osuna E, Freund R, Girosi F. Training Support Vector Machines: An Application to Face Detection[C]. In Proceedings of Computer Vision and Pattern Recognition, 1997, 130-136.
    [55] Brown M, Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Haussler D. Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines[C]. Proceedings of the National Academy of Science of the United States of America, 2000, 97: 262-267.
    [56] Zhang X, Yan W U, Zhao X, et al. Nonlinear Biological Batch Process Monitoring and Fault Identification Based on Kernel Fisher Discriminant Analysis[J]. Process Biochemistry, 2007, 42(8): 1200-1210.
    [57] Kim K. Financial Time Series Forecasting Using Support Vector Machines[J]. Neurocomputing, 2003, 55(1-2): 307-319.
    [58] Campbell C. Kernel Methods: A Survey of Current Techniques[J]. Neurocomputing, 2002, 48(1-4): 63-84.
    [59]饶鲜,董春曦,杨绍全.基于支持向量机的入侵检测系统[J].软件学报, 2003, 14(4): 798-803.
    [60]李昆仑,黄厚宽,田盛丰.模糊多类支持向量机及其在入侵检测中的应用[J].计算机学报, 2005, 28(4): 274-279.
    [61]吴涛,贺汉根,贺明科.基于插值核函数的构造[J].计算机学报, 2003, 20(8): 990-996.
    [62] Xu Y S, Zhang H Z. Refinement of Reproducing Kernels[J]. Journal of Machine Learning Research, 2009, 10: 107-140.
    [63] Micchelli C A, Pontil M. Learning the Kernel Function via Regularization[J]. Journal of Machine Learning Research, 2005, 6: 1099-1125.
    [64] Domeniconi C, Gunopulos D. Incremental Support Vector Machine Construction[C]. In Proceedings of the First IEEE International Conference on Data Mining, 2001, 589-592.
    [65] Van Sanden S, Lin D, Burzykowski T. Performance of Gene Selection and Classification Methods in a Microarray Setting: a Simulation Study[J]. Communications in Statistics—Simulation and Computation, 2008, 37(2): 409-424.
    [66] Bishof C H, Quintana-Qrtf G. Computing Rank-revealing QR Factorizations of Dense Matrices[J]. ACM Transactions on Mathematical Software, 1998, 24(2): 226-253.
    [67] Chan T F. Rank Revealing QR Factorizations[J]. Linear Algebra and Its Applications, 1987, 88/89: 67-82.
    [68] Chan T F, Hansen P C. Some Applications of the Rank Revealing QR Factorization[J]. SIAM Journal on Scientific and Statistical Computing, 1992, 13: 727-741.
    [69] Chan T F, Hansen P C. Low-rank Revealing QR Factorizations[J]. Linear Algebra and Its Applications, 1994, 1: 33-44.
    [70] Deshpande A, Vempala S. Adaptive Sampling and Fast Low-rank Matrix Approximation[C]. Electronic Colloquium on Computational Complexity, 2006.
    [71] Deshpande A, Rademacher L, Vempala S, et al. Matrix Approximation and Projective Clustering via Volume Sampling[C]. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, 2006, 1117-1126.
    [72] Smola A J, Sch?lkopf B. Sparse Greedy Matrix Approximation for Machine Learning[C]. In Proceedings of the 17th International Conference on Machine Learning, CA: Stanford, 2000, 911-918.
    [73] Girosi F, Jones M, Poggio T. Regularization Theory and Neural Networks Architectures[J]. Neural Computation, 1995, 7: 219-269.
    [74] Girosi F. An Equivalence Between Sparse Approximation and Support Vector Machines[J]. Neural Computation, 1998, 10(6): 1455-1480.
    [75] Natarajan B K. Sparse Approximate Solutions to Linear Systems[J]. SIAM Journal of Computing, 1995, 24(2): 227-234.
    [76] Williams C K I, Seeger M. Using the Nystr?m Method to Speed Up Kernel Machines[C]. In: Advances in Neural Information Processing Systems, 2001, British Columbia, Canada: Vancouver.
    [77] Fine S, Scheinberg K. Efficient SVM Training Using Low-rank Kernel Representations[J]. Journal of Computational Mathematics, 2001, 2: 243-264.
    [78] Lanckriet G, Cristianini N, Bartlett P, et al. Learning the Kernel Matrix with Semidefinite Programming[J]. Journal of Machine Learning Research, 2004, 5: 27-72.
    [79] Drineas P, Mahoney M W. On the Nystr?m Method for Approximating a Gram Matrix for Improved Kernel-based Learning[J]. Journal of Computational Mathematics, 2005, 6: 2153-2175.
    [80] Bach F R, Jordan M I. Predictive Low-rank Decomposition for Kernel Methods[C]. In Proceedings of 22nd International Conference on Machine Learning, 2005, Germany: Bonn, 33-40.
    [81] Kulis B, Sustik M A, Dhillon I S. Low-rank Kernel Learning with Bregman Matrix Divergences[J]. Journal of Computational Mathematics, 2009, 10: 341-376.
    [82] Boutsidis C, Mahoney M W, Drineas P. An Improved Approximation Algorithm for the Column Subset Selection Problem[C]. In Proceedings of the 20th Annual SODA, 2009, 968-977.
    [83] Belabbas M A, Wolfe P J. Spectral Methods in Machine Learning and New Strategies for Very Large Data Sets[C]. In Proceedings of the National Academy of Sciences, USA, 2009, 106: 369-374.
    [84] Golub G H, Van Loan C F. Matrix Computation[M]. 3rd ed, The Johns Hopking University Press, Baltimore, USA, 1996, 49-85.
    [85]王耕禄.矩阵理论[M].国防工业出版社, 1988.
    [86]孙继广.矩阵扰动分析[M].第二版,科学出版社, 2001.
    [87]詹兴致.矩阵论[M].高等教育出版社, 2008.
    [88]孙法义,丁勇.线性代数[M].清华大学出版社, 2005.
    [89]史卫亚.大规模数据集下核方法的技术研究[D].复旦大学, 2008.
    [90] Jenssen R, Eltoft T, Girolami M, et al. Kernel Maximum Entropy DataTransformation and an Enhanced Spectral Clustering Algorithm[C], Advances in Neural Information Processing Systems, 2007, Vancouver, 19: 633-640.
    [91] Jenssen R. Kernel Entropy Component Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5): 847-860.
    [92] Jenssen R, Eltoft T. A New Information Theoretic Analysis of Sum-of-Squared-Error Kernel Clustering[J]. Neurocomputing, 2008, 72(1-3): 23-31.
    [93] Hild II K E, Erdogmus D, Torkkola K, et al. Feature Extraction Using Information-Theoretic Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(9): 1385-1392.
    [94] Hild II K E, Erdogmus D, Principe J C. Blind Source Separation Using Renyi’s Mutual Information[J]. IEEE Signal Processing Letters, 2001, 8(6): 174-176.
    [95] Cover T M, Thomas J A. Elments of Information Theory[M]. John Wiely & Sons, 2006.
    [96] Torkkola K. Feature Extraction by Non Parametric Mutual Information Maximization[J]. Journal of Machine Learning Research, 2003, 3: 1415-1438.
    [97] Parzen E. On the Estimation of a Probability Density Function and the Mode[J]. The Annals of Mathematics Statistics, 1962, 32: 1065-1076.
    [98] Silverman B W. Density Estimation for Statistics and Data Analysis[M]. London: Chapman and Hall, 1986.
    [99] Peng H C, Long F H, Ding C. Feature Selection based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238.
    [100] Fraser A M, Swinney H L. Independent Coordinates for Strange Attractors from Mutual Information[J]. Physical Review A, 1986, 33(2): 1134-1140.
    [101] Estevez P A, Tesmer M, Perez C A, et al. Normalized Mutual Information Feature Selection[J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201.
    [102] Frank A, Asuncion A. UCI Machine Learning Repository[EB/OL]. University of California, Ivine, School of Information and Computer Sciences, 2010, http://archive.ics.uci.edu/ml.
    [103] USPS. The United States Postal Service[EB/OL]. NEO CANDO system, Center on Urban Poverty and Community Development, MSASS, Case Western Resere University, 2006, http://neocando.case.edu.
    [104] Kitchin Patty L, Foutz Robert V. A new Method for Comparing Experiments and Measuring Information[J]. Communications in Statistics-Simulation and Computation, 2001, 30(1): 143-157.
    [105] Braun M L, Buhmann J M, Muller K R. On Relevant Dimensions in Kernel Feature Spaces[J]. Journal of Computational Mathematics, 2008, 9: 1875-1908.
    [106] Huber W, Buness A, Ruschhaupt M, et al. Lymphoma data[EB\OL]. http://bioconductor.org/packages/1.9/data/experiment/html/lymphoma.html.
    [107] Hsu C W, Lin J. A Comparison of Methods of Multiclass Support Vector Machines[J]. IEEE Transactions on Neural Networks, 2002, 13: 415-425.
    [108]历小润.模式识别的核方法研究[D].浙江大学, 2007.
    [109]赵峰.基于核方法与累积量随机学习的模式分类研究[D].西安电子科技大学, 2007.
    [110] Frigyik B A, Srivastave S. Gupta M R. Functional Bregman Divergences and Bayesian Estimation of Distributions[J]. IEEE Transactions on Information Theory, 2008, 54: 5130-5139.
    [111] Zhang Y, Zhou Z H. Non-Metric Label Propagation[C]. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, 2009, Pasadena, CA, 1357-1362.
    [112] Muthukrishnan S, Sahinalp S C. Approximate Nearest Neighbors and Sequence Comparison with Block Operation[C]. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, 2000, Portland, OR, 416-424.
    [113] Bennett C H, Gacs P, Li M, et al. Information Distance[J]. IEEE Transactions on Information Theory, 1998, 44: 1407-1423.
    [114] Ackermann M R, Blomer J, Sohler C. Clustering for Metric and Non-Metric Distance Measures[C]. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, 2008, San Francisco, California, 799-808.
    [115] Tan X, Shen S, Zhou Z H, et al. Learning Non-Metric Partial Similarity based on Maximal Margin Criterion[C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, 2: 138-145.
    [116] Agarwal S, Wills J, Cayton L, et al. Generalized Non-Metric Multidimensional Scaling[C]. In: 11st International Conference on Artificial Intelligence and Statistics, 2007, San Juan, Puerto Rico.
    [117] Cover T M, Thomas J A. Elments of Information Theory[M]. John Wiley & Sons, 1985.
    [118] Santini S, Jain R. Similarity Measures[J]. IEEE Transactions on Pattern Analysisand Machine Intelligence, 1999, 21(9): 871-883.
    [119] Kevin Zhou S, Chellappa R. From Sample Similarity to Ensemble Similarity: Probabilistic Distance Measure in Reproducing Kernel Hilbert Space[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(6): 917-929.
    [120] Athitsos V, Alon J, Sclaroff S. Efficient Nearest Neighbor Classification Using a Cascade of Approximate Similarity Measures[C]. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, Washington, DC, USA.
    [121] Boubacar H A, Lecoeuche S. A New Kernel-based Algorithm for Online Clustering[C]. In: International Conference on Artificial Neural Networks, 2005, Warsaw, Poland.
    [122] Ilhan S, Duru N, Adali E. Improved Fuzzy Art Method for Initializing k-means[J]. International Journal of Computational Intelligence Systems, 2010, 3(3): 274-279.
    [123] Baghshah M S, Shouraki S B. Non-Linear Metric Learning Using Pairwise Similarity and Dissimilarity Constraints and the Geometrical Structure of Data[J]. Pattern Recognition, 2010, 43(8): 2982-2992.
    [124] Torkkola K. Feature Extraction by Non-Parametric Mutual Informaiton Maximization[J]. Journal of Machine Learning Research, 2003, 3: 1415-1438.
    [125] Lin Y, Lin C, Tsai Y, et al. A Spectral Graph Theoretic Approach to Quantification and Calibration of Collective Morphological Differences in Cell Images[J], Bioinformatics, 2010, 26: i29-i37.
    [126] Boyd S, Vandenberghe L. Convex Optimization[M]. Cambridge University Press, 2004.
    [127] Vandenberghe L, Boyd S. Semidefinite Programming[J]. SIAM Review, 1996, 38(1): 49-95.
    [128] Sch?lkopf B, Smola A J, Müller K R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem[J]. Neural Computation, 1998, 10: 1299-1319.
    [129] Choi H, Choi S. Kernel Isomap[J]. Electronics Letters, 2004, 40(25): 1612-1613.
    [130] Li L, Li M, Lu Y, et al. A New Multi-objective Genetic Algorithm for Feature Subset Selection in Fatigue Fracture Image Identification[J]. Journal of Computers, 2010, 5(7): 1105-1111.
    [131] Rodriguez-Lujan I, Huerta R, Elkan C, et al. Quadratic Programming Feature Selection[J]. Journal of Machine Learning Research, 2010, 11: 1491-1516.
    [132] Oveisi F, Erfanian A. A Minimax Mutual Information Scheme for Supervised Feature Extraction and its Application to EEG-based Brain-Computer Interfacging[J]. EURASIP Journal on Advances in Singal Processing, 2008.
    [133] Lee K. Exploration on Feature Extraction Schemes and Classifiers for Shaft Testing System[J]. Journal of Computers, 2010, 5(5): 679-686.
    [134] Yuan X, Hu B. Robust Feature Extraction via Information Theoretic Learning[C]. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, 2009, 1193-1200.
    [135] Baudat G, Anouar F. Generalized Discriminant Analysis Using a Kernel Approach[J]. Neural Computation, 2001, 12: 2385-2404.
    [136] Teixeira A R, Tome A M, Lang E W. Feature Extraction Using Low-rank Approximations of the Kernel Matrix[C]. Proceedings of the 5th International Conference on Image Analysis and Recognition, Lecture Notes In Computer Science, Povoa de Varzim, Portugal, 2008, 5112: 404-412.
    [137] Wu M, Farquhar J. A Subspace Kernel for Nonlinear Feature Extraction[C]. Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 2007, 1125-1130.
    [138] Torkkola K. Feature Extraction by Non-parametric Mutual Information Maximization[J]. Journal of Machine Learning Research, 2003, 3: 1415-1438.
    [139] Kamimura R. An Information-Theroretic Approach to Feature Extraction in Competitive Learning[J]. Neurocomputing, 2009, 72: 2693-2704.
    [140] Zhu X, Ghahramani Z. Learning From Labeled and Unlabeled Data with Label Propagation[C]. Technical Report CMU-CMLD-02-107, Camegie Mellon University, Pittsburg, PA, 2000.
    [141] Shen C, Li H, Brooks M J. Feature Extraction Using Sequential Semidefinite Programming[C]. Proceedings of the 9th biennial conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications, Glenelg, Australia, 2007, 430-437.
    [142] Wu Q, Guinnev J, Maggioni M, et al. Learning Gradients: Predictive Models that Infer Gemotry and Statistical Dependence[J]. Journal of Machine Learning Research, 2010, 11: 2175-2198.
    [143] Cai J, Wang H, Zhou D. Gradient Learning in a Classification Setting by Gradient Descent[J]. Journal of Approximation Theory, 2009, 161: 674-692.
    [144] Ratliff N D, Bagnell J A. Kernel Conjugate Gradient for Fast Kernel Machines[C]. Proceedings of the 20th International Joint Conference on ArtificalIintelligence, Hyderabad, India, 2007,1017-1022.
    [145] Tzimiropoulos G, Argyriou V, Zafeiriou S, et al. Robust FFT-based Scale-Invariant Image Registration with Image Gradients[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(10): 1899-1906.
    [146] Weinberger K Q, Saul L K. Distance Metric Learning for Large Margin Nearest Neighbor Classification[J]. Journal of Machine Learning Research, 2009, 10 207-244.
    [147] Goldberger J, Roweis S, Hinton G, et al. Neighbourhood Components Analysis[C]. Advances in Neural Information Processing Systems,Cambridge, MA: MIT Press, 2005, 17: 513-520.
    [148] Torresani L, Lee K C. Large Margin Component analysis[C]. Advances in Neural Information Processing Systems, MA: MIT Press, 2007, 19: 1385-1392.
    [149] Chopra S, Hadsell R, LeCun Y. Learning a Similarity Metric Discriminatively, with Application to Face verification[J]. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, 2005, 349-356.
    [150] Globerson A, Roweis S T. Metric Learning by Collapsing Classes[C]. In Advances in Neural Information Processing Systems, 2008, 18.
    [151] Jason V D, Brian K, Prateek J, et al. Information-Theoretic Metric Learning[C]. In Proceedings the 24th International Conference on Machine Learning, Corvallis, OR, 2007.
    [152] Yu K, Ji L, Zhang X G. Kernel Nearest Neighbor Algorithm[J]. Neural Processing Letters, 2002, 15: 147-156.
    [153]周晓飞,杨静宇,姜文瀚.核最近邻凸包分类算法[J].中国图象图形学报, 2007, 12 (7): 1209-1213.
    [154]郝红卫,蒋蓉蓉.基于最近邻规则的神经网络训练样本选择方法[J].自动化学报, 2007, 33 (12): 1247-1251.
    [155]叶涛,朱学峰,李向阳,史布海.基于改进k-最近邻回归算法的软测量建模[J].自动化学报, 2007, 33 (9): 996-999.
    [156]张鸿宾,孙广煜.近邻法参考样本集的最优选择[J].电子学报, 2000, 28 (11): 16-21.
    [157] Amari Shun-ichi, Nagaoka H. Methods of Information Geometry[J]. Translations of Mathematical Monographs, 2000, 191, American Mathematical Society.
    [158] Torkkola K. Feature Extraction by Non-Parametric Mutual Information Maximization[J]. Journal of Machine Learning Research, 2003, 3: 1415-1438.
    [159]徐东彬,黄磊,刘昌平.自适应核密度估计运动检测方法[J].自动化学报,2009, 35 (4): 379-385.
    [160] Zhang Y, Zhou Z H. Non-Metric Label Propagation[C]. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, CA, 2009, 1357-1362.
    [161] Crammer K, Singer Y. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines[J]. Journal of Machine Learning Research, 2001, 2: 265-292.
    [162] http://www.uk.research.att.com/facedatabase.html.
    [163] http://yann.lecun.com/exdb/mnist/.
    [164] http://people.csail.mit.edu/jrennie/20Newsgroups.
    [165] Arya S, Malamatos, Mount D M. Space-time Tradeoffs for Approximate Nearest Neighbor Searching[J]. Journal of the ACM, 2009, 57: 1-54.
    [166] Lee Y J, et al. Roles of The Mammalian Mitochondrial Fission and Fusion Mediators Fis1, Drp1, and Opa1 in Apoptosis[J]. Molecular Biology of the Cell, 2004, 15(11): 5001-5011.
    [167]付蓉,李洁,高新波.基于形态学成分分析的静态极光图像分类算法[J].光子学报, 2010, 39(6): 1034-1039.
    [168] Lang P, et al. Cellular Imaging In Drug Discovery[J]. Nature Reviews Drug Discovery, 2006, 5: 343-356.
    [169] Jones T R, et al. Scoring Diverse Cellular Morphologies In Image-based Screens With Iterative Feedback and Machine Learning[C]. Proceedings of the National Academy of Sciences, 2009, 106: 1862-1831.
    [170] Loo L H, et al. Image-based Multivariate Profiling of Drug Responses From Single Cells[J]. Nature Methods, 2007, 4: 445-453.
    [171] Nanni L, Lumini A, Lin Y, et al. Fusion of Systems for Automated Cell Phenotype Image Classification[J]. Expert Systems with Applications, 2010, 37: 1556-1562.
    [172] Lin Y S, Lin C C, Tsai Y S, et al. A Spectral Graph Theoretic Approach to Quantification and Calibration of Collective Morphological Differences in Cell Images[J]. Bioinformatics, 2010, 26: i29-i37.
    [173] Chang H H, Moura J M F. Classification by Cheeger Constant Regularization[C]. In Proceedings of IEEE International Conference on Image Processing, 2007, II209-II212.
    [174] Buades A, Coll B, Morel J. A Non-Local Algorithm for Image Denoising[J]. IEEE Computer Society Conference on Computer Vision and PatternRecognition, 2005, 2(6): 60-65.
    [175]王卫卫,韩雨,冯象初.基于非局部扩散的图像去噪[J].光学学报, 2010, 30(2): 373-377.
    [176]杨露,苏秀琴,陆陶,等.基于Contourlet变换和形态学的图像去噪方法[J].光电子?激光, 2008, 19(11): 1558-1560.
    [177]乔闹生,叶玉堂,刘霖,等.基于均值去噪与图像增强的方差滤波器[J].光电子?激光, 2008, 19(12): 1666-1669.
    [178] Crammer K, Singer Y. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines[J]. Journal of Machine Learning Research, 2001, 2: 265-292.
    [179]王毅,牛瑞卿,喻鑫,沈焕峰.基于时间变化的鲁棒各向异性扩散模型[J].自动化学报, 2009, 35(9): 1253-1256.
    [180] Chebira A, Barbotin Y, et al. A Multiresolution Approach to Automated Classification of Protein Subcellular Location Images[J]. BMC Bioinformatics, 2007, 8: 210.
    [181] Zhu X. et al. Semi-supervised Learning Using Gaussian Fields and Harmonic Functions[C]. In Proceeding of the 12th International Conference on Machine Learning, 2003, AAAI Press, Menlo Park, CA, 912-919.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700