用户名: 密码: 验证码:
基于RM界的多核支持向量机算法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
支持向量机(support vector machine, SVM)是数据挖掘中的一项新技术,是借助于最优化方法解决机器学习问题的新工具。支持向量机具有全局最优、结构简单、推广能力强等优点。支持向量机通过引入核函数,简化了高维空间中的内积运算,很好地解决了非线性分类问题。对多数据源或异构数据集,由于单一核在学习过程中存诸多局限性,多核函数的提出成为必要。常用核函数的凸组合是多核函数构造和多核方法学习的重要形式。同训练核参数一样,多核函数中组成核函数的系数也需要优化。本文以一般梯度算法为基础,研究组成核函数的系数的优化方法,主要作了以下一些工作:
     首先,本文将简单介绍支持向量机的基本理论及其问题模型,探讨核函数的本质,说明核函数与所映射空间之间的关系,进一步分析核函数的构成定理和构成方法。对于多核函数,本文将证明Mercer核的凸组合仍然是Mercer核,并通过数值试验说明常用核函数的凸组合构成的多核函数能够加强决策函数的分类性能,能提高分类机的泛化性能。
     其次,本文将概述目前优化多核函数中组成核函数系数的一般梯度算法,这些算法简单易行,收敛速度快,但是把最小化几何间隔作为目标函数,犹显粗糙。本文将寻找其它上界作目标函数。利用梯度算法,在超参数空间最小化支持向量机风险上界可以进行超参数的选取。本文把这种思想推广,通过最小化半径-间隔界(Radius Margin Bound简称RM界)训练多核函数中组成核函数的系数。数值实验表明,在优化规划中加入RM界训练出的分类机的性能更高,从而证明了所提算法的有效性。
In data mining, support vector machine (SVM) is a new technique, it solves machine learning problem by means of optimized methods. Support vector machine has the advantages of global optimization,simple structure and high practicality. Support vector machine introduces kernel functions to simplify inner product operation and to solve nonlinear classification problem. With multiple data sources and heterogeneous data, it is necessary to put forward multiple kernel function because a single kernel function has some limitation in learning process. The convex combination of common kernel functions is an important form of multiple kernel construction and learning. As same as the training of kernel parameters, the convex combination coefficients of the various kernel functions also need to be optimized. This paper deals with the optimization method based on the general gradient method. we have accomplished the following work.
     First of all, we discuss the basic theory of support vector machine, and its problem model. we discuss the essence of kernel function and show that the relationship between kernel functions and nonlinear mappings and mapped spaces. we analyze further the construction theorem and method of kernel function, and prove that The convex combination of Mercer kernel is also Mercer kernel. In this paper, we show that multiple kernel functions consisting of common kernel functions can improve the generalization performance of the classification.
     Secondly, this paper makes a review on the present gradient algorithm of optimizing the coefficients of kernels in multiple kernel functions. The objective function of these algorithms is minimization of geometric margin, this objective is rough, so we need to find new goals. For choosing kernel hyper-parameters, the gradient method minimizes the risk under bound of support vector machines in hyper-parameters space. The present paper attempts to extend this thought. The nonnegative combination coefficients are determined via minimizing the Radius Margin Bound of support vector machines. Experimental results demonstrate that the performance of the classification is improved after RM bound introduced to optimal programming. Consequently the experiments result proved the validity of this method.
引文
[1] V. Vapnik. The Nature of Statistical Learning Theory. New York Springer, 1995.
    [2]邓乃扬,田英杰.数据挖掘中的新方法——支持向量机[M].北京:科学出版社,2006.
    [3] C. Cortes, V. Vapnik. Support-Vector Networks. Machine Learning, 1995, 20,273-297.
    [4] O.Chapelle,V.Vapnik,O.Bousquet, et al Choosing multiple parameters for support vector machines. Machine Learning, 2002, 46:131-159.
    [5] B. Sch?lkopf, A. Smola, R. Williamson, et al. New Support Vector Algorithms. Neural Computation, 2000, 12,1207-1245.
    [6] B. Boser, I. Guyon, V. Vapnik. A Training Algorithm for Optimal Margin Classifiers. In: Haussler D, ed. Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory. Pittsburgh, PA: ACM Press,1992,145-152
    [7] B. Sch?lkopf, C. Burges, V.Vapnik. Extracting Support Data for A Given Task. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings, First International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, CA, 1995,262~267
    [8] T. Friess, N.Cristianini, C. Camplell. The Kernel Adatron Algorithm: A Fast and Simple lLearning Procedure for Support Vector Machines. In Proceeding of 15th Intl.Conf. Machine Learning, Morgan Kaufman Publishers,1998
    [9] O. Mangasarian, D. Musicant. Successive over Relaxation for Support Vector Machines. IEEE Trans on networks,1999,10(5),1032-1037
    [10] T.Lee, O. Mangasarian. RSVM: Reduced Support Vector Machines. In Proceeding of The first SIAM International Conference on Data Mining, 2001
    [11] J.Suykens, J.Vandewalle. Least Squares Support Vector Machine Classifiers. Neural Processing Letters 1999,9,293-300.
    [12] Hong-Gunn Chew, J. David, Crisp, et al. Target Detection in Radar Imagery Using Support Vector Machines with Training Size Biasing. In Proceeding of 6th International Conference on Control, Automation, Robotics And Vision, Singapore, 2000.
    [13] Issam Dagher. Quadratic Kernel-free Non-linear Support Vector Machine. J Glob Optim 2008,41,15-30.
    [14] E. Osuna, R. Freund. An Improved Training Algorithm for Support Vector Machines. Proceedings of the IEEE Workshop on Neural Networks for SignalProcessing. Amelia Island. FL, USA,1997,276-285.
    [15] T. Joachims. Making Large-scale SVM Learning Practical. In Advances in Kernel Methods-Support Vector Learning. MIT Press,1999,169-184
    [16] Lin Chihjen. On The Convergence of the Decomposition Method for Support Vector Machines. IEEE Transactions on Neural Networks. 2001,12(6),1288-1298
    [17] J.Platt. Fast Training of Support Vector Machine Using Sequential Minimal Optimization. In: Advanced in Kernel Methods-Support Vector Learning. MIT Press.1999,185-208
    [18] S. Keerthi, G. Gilbert. Convergence of a Generalized SMO Algorithm for SVM Classifier Design. Machine Learning. 2002,46(1/3),351-360
    [19] S. Keerthi, S. Shevade, C. Bhattaharyya et al. Improvements to Platt’s SMO algorithm for SVM Classifier Design. Neural Computation. 2001,1(13),637-649.
    [20] G. Cauwenberghs, T. Poggio. Incremental and Decremental Support Vector Machine Learning. In: Advances in Neural Information Processing Systems (NIPS 2000), Cambridge, MA: MIT Press,2001.
    [21] G. Valentini, T. Dieterich. Bias-variance Analysis of Support Vector Machines for The Development of SVM-Based Ensemble Methods. Journal of Machine Learning Research. 2004,5,725-775.
    [22] H. Graf, E. Cosato,et al. Parallel Support Vector Machines: The Cascade SVM. In: Advances in Neural Information Processing Systems, Cambridge, MA: MIT Press,2005,521-528.
    [23] M. Brown, W. Grundy, D. Liu, et al. Support Vector Machine Classification of Microarray Gene Expression Data. Technical report UCSC-CRL-99-09,University of California, Santa Cruz.
    [24] Valentini, Giorgio. Gene Expression Data Analysis of Human Lymphoma Using Support Vector Machines and Output Coding Ensembles. Artificial Intelligence in Medicine, 2002,26(3),281-304.
    [25] Liu Weiqiang, Shen Peihua, Qu Yingge et al. Fast Algorithm of Support Vector Machines in Lung Cancer Diagnosis. Proceedings of International Workshop on Medical Imaging and Augmented Reality, 2001,188-192.
    [26] P. Bhanu, A. Ramakrishnan, S. Suresh, et al. Fetal Lung Maturity Analysis Using Ultrasound Image Features. IEEE Transactions on Information Technology in Biomedicine, March 2002,6(1),38-45.
    [27] B. Sch?lkopf, S.Mika. Input Space Versus Feature Space in Kernel-based Methods. IEEE Transactions on Neural Networks. 1999,10(5):1000-1017.
    [28] J.Mercer. Function of Positive and Negative Type and Their Connection with The Theory of Integral Equations. Philos, Trans. Roy, Soc. London, 1909,A 209:415-446.
    [29] C. Berg, J.P.Christensen, P. Ressel. Hamonic Analysis on Semi-groups. New York: Springer-Verlag, 1984.
    [30] S.Saitoh. Theory of Reproducing Kernels and Its Applications. Longman Scientific and Technical, Harlow, England,1988.
    [31]T.S.Jaakkola, D.Haussler. Exploiting Generative Models in Discriminative Classifiers. In: Advances in Neural Information Processing Systems, 11. MIT Press,1998
    [32] T.Joachims. Estimating The Generalization Performance of a SVM Efficiently. In: Proc. of ICML-00,17th Intl. Conf. on Machine Learning. Morgan Kaufman,2000.
    [33] N.Cristianini,C. Camplell,J.Shawe Taylor. Dynamically Adapting Kernels in Support Vector Machines. Advances in Neural Information Processing Systems,Vol.11,MIT Press,Cambirdage,MA,1999, 204- 210.
    [34] J.Mercer. Functions of Positive and Negative Type and Their Connection with The Theory of Integral Equations. Philosophical Transactions of the Royal Society. London,1909,A209:415-446,
    [35] B. Sch?lkopf,A. Smola, K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computations,1998,10:1299一1319.
    [36] S.Mika, G.Ratsch, J.Weston et al. Fisher Discriminant Analysis with Kernels. Neural Networks for Signal Processing IX IEEE. 1999,41-48
    [37] B. Sch?lkopf,S.Mika, C.Burges et al. Input Space vs. Feature Sapce in Kernel- based Methods. IEEE Trans. on Neural Networks. 1999, 10(5):1000一1017
    [38] B. Sch?lkopf. The Kernel Trick for Distances. .Advanced in Neural Information Processing Systems. 2000, 301-307
    [39] B. Sch?lkopf, R. Herbrich, A. Smola et al. A Generalized Represent Theorem. Technical Report 81,NeuroCOLT, 2000.
    [40] G.Wahba. Support Vector Machines, Reproducing Kernel Hilber Space and The Randomized GACV. In B. Sch?lkopf, C.Burges, A.J.Smola, editors, Advances in Kernel Methods—Support Vector Learning, Cambridge, MA, MIT Press, 1999,69-88.
    [41] A.Ruiz. Nonlinear Kernel-based Statistical Pattern Analysis. IEEE Transactions on Neural Networks. 2001,12(1):16-32.
    [42] PasealVincent,YOshua Bengio. Kernel Matching Pursuit. Machine Learning,2002,48:165一187,
    [43] K.B. Duan, S.S. Keerthi, A.N. POO. Evaluation of Simple PerformanceMeasures for Tuning SVM Hyperparameters. Neurocomputing, 2003, 51,41-59.
    [44] G.. Lanckriet, N. Cristianini, L.E. Ghaoui, et al. Learning the Kernel Matrix with Semi-definite Programming. Journal of Machine Learning Research, 2004,5,27-72.
    [45] F. Bach,G. Lanckriet, M. Jordan. Multiple Kernel Learning, Conic Duality and the SMO algorithm. Proceedings of the 21st International Conference on Machine Learning , 2004,41-48.
    [46] S. Sonnenburg, G. Ratsch, C. Schafer, et al. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research, 2006,7,1531-1565.
    [47] A. Rakotomamonjy, F. Bach, S. Canu, et al. More Efficiency in Multiple Kernel Learning. In Zoubin Ghahramani, editor, Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), 2007,775–782.
    [48]贾磊,廖士中.超核函数支持向量机计算机科学2008,35(12): 148-150.
    [49] A.J.Smola. Learning with Kernels. Ph.D. Thesis,TU berlin,1998
    [50]吴涛,贺汉根.核函数的性质、方法及其在障碍检测中的应用国防科学技术大学2003
    [51]颜根挺,马广富,肖余之.一种混合核函数支持向量机算法哈尔滨工业大学学报2007,11,11(39):1704-1706
    [52] A. Rakotomamonjy, F. Bach, S. Canu,et al. SimpleMKL. Journal of Machine Learning Research, 2008,11(9): 2491--2521.
    [53] Ratsch G. Benchmark Repository. [DB/OL].[2006-12-20] http://users.rsise.anu.edu.au/~raetsch/data/

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700