支持向量机模型选择研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

支持向量机模型选择研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Model Selection for Support Vector Machine
作者：汪廷华
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：机器学习 ; 模式分类 ; 支持向量机 ; 模型选择 ; 核函数 ; 核函数评估
英文关键词：Machine learning ; Pattern recognition ; Support vector machine (SVM) ; Model selection ; Kernel function ; Kernel evaluation
学位年度：2010
导师：田盛丰
学科代码：081202
学位授予单位：北京交通大学
论文提交日期：2009-12-01
答辩委员会主席：石纯一

摘要

统计学习理论(Statistical Learning Theory,STL)为系统地研究有限样本情况下的机器学习问题提供了一套比较完整的理论体系。支持向量机(Support VectorMachine,SVM)是在该理论体系下产生的一种新的机器学习方法,它能较好地解决小样本、非线性、过学习、维数灾难和局部极小等问题,具有很强的泛化能力。支持向量机目前已经广泛地应用于模式识别、回归估计、概率密度估计等各个领域。不仅如此,支持向量机的出现推动了基于核的学习方法(Kernel-based LearningMethods)的迅速发展,该方法使得研究人员能够高效地分析非线性关系,而这种高效率原先只有线性算法才能得到。目前,以支持向量机为主要代表的核方法是机器学习领域研究的焦点课题之一。
     众所周知,支持向量机的性能主要取决于两个因素:(ⅰ)核函数的选择;(ⅱ)惩罚系数(正则化参数)C的选择。对于具体的问题,如何确定SVM中的核函数与惩罚系数就是所谓的模型选择问题。模型选择,尤其是核函数的选择是支持向量机研究的中心内容之一。本文针对模型选择问题,特别是核函数的选择问题进行了较为深入的研究,其中主要的工作和贡献如下:
     1.系统地归纳总结了统计学习理论、核函数特征空间和支持向量机的有关理论与算法。这些内容是本文工作的基础,作者力求在介绍这些内容时尽量做到简洁但又不失完整与系统性;同时在许多内容的叙述中也融入了作者自己学习的一些体会。
     2.研究了SVM参数的基本语义,指出数据集中的不同特征和不同样本对分类结果的影响可以分别由核参数和惩罚系数来刻画,从而样本重要性和特征重要性的考察可以归结到SVM的模型选择问题来研究。在对样本加权SVM模型(例如模糊SVM)分析的基础上,提出了特征加权SVM模型,即FWSVM。FWSVM本质上就是SVM与特征加权的结合,本文将特征加权引入到核函数的构造中,从而可以从核函数的角度来研究特征加权对SVM分类性能的影响。理论分析和数值实验的结果均表明,FWSVM比标准的SVM的泛化能力要好。
     3.在系统归纳总结SVM模型选择、尤其是核函数参数选择的常用方法(例如交叉验证技术、最小化LOO误差及其上界、优化核评估标准)之后,进一步研究了核极化的几何意义,指出高的核极化值意味着同类的数据点相互靠近而异类的数据点则相互远离,并提出了一种基于优化核极化的广义Gaussian核的参数选择算法KPG。和优化后的标准Gaussian核相比,使用优化后的广义Gaussian核的SVM具有更好的泛化能力。此外,提出了KPG算法的一种变体,即KPFS算法,并通过实验初步验证了KPFS用于SVM特征选择的有效性。
     4.在局部Fisher判别分析算法的启发下,对存在局部结构信息条件下的核评估标准问题进行了深入地讨论,指出目前常用的核评估标准都没有考虑同类数据的局部结构信息对分类性能的影响,这种“全局性”的评估标准有可能会限制增强数据可分性的自由度。基于这个缺陷,提出了一个“局部化”的核评估标准,即局部核极化。局部核极化通过引入亲和系数在一定程度上保持了同类数据的局部结构信息,能够进一步增强异类数据之间的可分性。该标准的有效性通过UCI数据集上的实验得到了充分的验证。
The main goal of statistical learning theory (STL) is to provide a comparatively integrated theoretical basis for studying the machine-learning problems with finite learning examples. Support vector machine (SVM) is a new learning algorithm, which was introduced in the framework of the STL. Compared with the traditional learning algorithms, SVM can overcome the problems such as small samples, nonlinear, overfitting, curse of dimensionality, local minima, etc., and generalize well for unseen data. Nowadays, SVM has been successfully applied for a wide range of different data analysis problems, such as pattern recognition, regression estimation, probability density estimation, etc. Furthermore, SVM brings about the growing popularity of the kernel-based learning methods, which can analyze efficiently the nonlinear relationship. Currently, SVM and other kernel methods have become one of the research focuses in machine learning community.
     It is well known that the performance of SVM depends mostly on the selection of kernel function and penalty coefficient (regularization parameter) C. Given a specific problem, how to select the kernel function and regularization parameter is well known as the model selection issue. Model selection, especially kernel selection, is one of the central interests in SVM. In this work, we concentrate us on the model selection, especially kernel selection, for SVM and attempt to make a considerably deep exploration on some aspects of this issue. The main contents and contributions of this dissertation are as follows:
     1. We summarize systematically the statistical learning theory, kernel feature space and SVM, which are the bases of this work. We introduce these contents in a considerably concise way and simultaneously strive to avoid the decrease in integrality and systematization. Additionally, during the introduction, we add appropriately some of our own understanding on these contents.
     2. We explore the semantic interpretation of the SVM parameters, and point out that the influence of different features and samples on the classification results can be measured by the kernel parameter and regularization parameter, hence the investigation of the importance of the features and samples for SVM can be reduced to a model selection issue. Based on the analysis of the sample weighted SVM model (such as Fuzzy SVM), a new model, i.e., feature weighted SVM (FWSVM for short) is proposed. FWSVM in essence is the combination of the feature weighting and SVM. However, we introduce the feature weighting into the construction of kernel function, hence we can analyze the influence of the feature weighting on the SVM classification performance from the perspective of kernel function. Theoretical analysis and experimental results show that the FWSVM has better generalization ability than the standard SVM.
     3. First we summarize systematically the commonly used model selection (especially kernel parameter selection) methods, such as cross-validation technique, minimizing the LOO error or its upper bounds, optimizing kernel evaluation measure, etc. After that, we investigate further the geometric significance of the kernel polarization, and point out that high kernel polarization value means to keep within-class data pairs close and between-class data pairs apart. Subsequently, we propose an algorithm for learning the general Gaussian kernels by optimizing kernel polarization, i.e., kernel polarization-based gradient ascent algorithm (KPG for short). Compared with the optimized standard Gaussian kernel, general Gaussian kernel which is adapted by KPG can yield better generalization performance of SVM. Additionally, we also propose a variant of KPG for SVM feature selection, i.e., KPFS, which is demonstrated preliminarily with some UCI machine learning benchmark examples.
     4. Enlightened by the Local Fisher Discriminant Analysis (LFDA), we explore the design of kernel evaluation measure in the case of multimodality (samples of the same class form several separate clusters, i.e., local structure of the data of the same class). We point out that currently commonly used kernel evaluation measures all neglect the influence of the local structure on the classification performance and the 'globality' of the these measures may leave less degree of freedom for increasing separability. To overcome this disadvantage, we then propose a 'localized' kernel evaluation measure, i.e., local kernel polarization. Local kernel polarization can preserve to some extent the local structure of the data of the same class by introducing the affinity coefficients between the data pairs, hence can increase further the separability of the between-class data points. Local kernel polarization is demonstrated with some UCI machine learning benchmark examples.

引文

[1]B.Boser,I.Guyon and V.Vapnik.A training algorithm for optimal margin classifiers.In:Proceedings of the 5th Annual ACM Conference on Computational Learning Theory (COLT' 1992),Pittsburgh,PA,USA.ACM Press,1992,pp.144-152.
    [2]V.Vapnik著.张学工译.统计学习理论的本质.北京:清华大学出版社,2000.
    [3]T.Mitchell著.曾华军,张银奎等译.机器学习.北京:机械工业出版社,2003.
    [4]史忠植.知识发现.北京:清华大学出版社,2002.
    [5]F.Rosenblatt.The perceptron:a probabilistic model for information storage and organization in the brain.Psychological Review,1958,65(6):386-408.
    [6]M.Minsky and S.Papert.Perceptrons.Cambridge,MA:MIT Press,1969.
    [7]V.Vapnik and A.J.Chervonenkis.On the uniform convergence of relative frequencies of events to their probabilities.Doklady Akademii Nauk USSR,1968,181(4).(English translation:Sov.Math.Dokl).
    [8]V.Vapnik and A.J.Chervonenkis.On the uniform convergence of relative frequencies of events to their probabilities.Theory Probability Application,1971,16(2):264-280.
    [9]D.E.Rumelhart and J.L.McClelland.Learning internal representations by error propagation.Parallel distributed processing:explorations in the macrostructure of cognition(Vols.1 & 2).Cambridge,MA:MIT Press,1986,pp.318-362.
    [10]J.R.Quinlan.Induction of decision trees.Machine Learning,1986,1(1):81-106.
    [11]C.Cortes and V.Vapnik.Support-vector networks.Machine Learning,1995,20(3):273-297.
    [12]N.Cristianini and J.Shawe-Tayor著.李国正,王猛,曾华军译.支持向量机导论.北京:电子工业出版社,2004.
    [13]K.R.M(u|¨)ller,S.Mika,G.R(a|¨)tsch,K.Tsuda and B.Sch(o|¨)lkopf.An introduction to kernel-based learning algorithms.IEEE Transactions on Neural Networks,2001,12(2):181-202.
    [14]邓乃扬,田英杰.数据挖掘中的新方法——支持向量机.北京:科学出版社,2004.
    [15]J.Shawe-Tayor and N.Cristianini著.赵玲玲,翁苏明,曾华军等译.模式分析的核方法.北京:机械工业出版社,2006.
    [16]S.Mika,G.R(a|¨)tsch,J.Weston,B.Sch(o|¨)lkopf and K.R.M(u|¨)ller.Fisher discriminant analysis with kernels.In:Proceedings of the 1999 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing Ⅸ,1999,pp.41-48.
    [17]G.Baudat and F.Anouar.Generalized discriminant analysis using a kernel approach.Neural Computation,2000,12(10):2385-2404.
    [18]S.Mika,B.Sch(o|¨)lkopf,A.J.Smola,K.R.M(u|¨)ller,M.Scholz and G.R(a|¨)tsch.Kernel PCA and de-noising in feature spaces.In:Advances in Neural Information Processing Systems(NIPS) 11,1998,pp.536-542.
    [19]M.H.Nguyen and F.D.L.Torre.Robust kernel principal component analysis.In:Advances in Neural Information Processing Systems(NIPS) 21,2008.
    [20]F.R.Bach and M.I.Jordon.Kernel independent component analysis.Journal of Machine Learning Research,2002,3:1-48.
    [21]M.Girolami.Mercer kernel-based clustering in feature space.IEEE Transactions on Neural Networks,2002,13(3):780-784.
    [22]张莉,周伟达,焦李成.核聚类算法.计算机学报,2002,25(6):587-590.
    [23]H.Choi and S.Choi.Kernel Isomap.Electronics Letters,2004,40(25):1612-1613.
    [24]H.Choi and S.Choi.Robust kernel Isomap.Pattern Recognition,2007,40(3):853-862.
    [25]李昆仑.支持向量机学习的扩展及其应用研究.博士论文,北京交通大字,2004.
    [26]文香军.核机器的理论、算法及应用研究.博士论文,上海交通大学,2006.
    [27]尹传环.结构化数据核函数的研究.博士论文,北京交通大学,2007.
    [28]边肇祺,张学工.模式识别(第二版).北京:清华大学出版社,2000.
    [29]T.Poggio,R.Rifkin,S.Mukherjee and P.Niyogi.General Conditions for predictivity in learning theory.Nature,2004,428:419-422.
    [30]P.Andras.The equivalence of support vector machine and regularization neural networks.Neural Processing Letters,2002,15(2):97-104.
    [31]E.D.Vito,L.Rosasco,A.Caponnetto,M.Piana and A.Verri.Some properties of regularized kernel methods.Journal of Machine Learning Research,2004,5:1363-1390.
    [32]J.T.Y.Kwok.The evidence framework applied to support vector machines.IEEE Transactions on Neural Networks,2000,11(5):1162-1173.
    [33]A.J.Smola and B.Sch(o|¨)lkopf.Bayesian kernel methods.Lecture Notes in Computer Science (LNCS),2003,2600:65-117.
    [34]L.Kaufmann.Solving the quadratic programming problem arising in support vector classification.In:Advances in Kernel Methods:Support Vector Learning,MIT Press,1999,pp.147-168.
    [35]E.Osuna,R.Freund and F.Girosi.An improved training algorithm for support vector machines.In:Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Processing Ⅶ,1997,pp.276-285.
    [36]T.Joachims.Making large-scale SVM learning practical.In:Advances in Kernel Methods:Support Vector Learning,MIT Press,1999,pp.169-184.
    [37]J.Platt.Fast Training of support vector machines using sequential minimal optimization.In:Advances in Kernel Methods:Support Vector Learning,MIT Press,1999,pp.185-208.
    [38]S.S.Keerthi,S.K.Shevade,C.Bhattcharyya and K.R.K.Murthy.Improvements to Platt's SMO algorithm for SVM classifier design.Neural Computation,2001,13(3):637-649.
    [39]S.S.Keerthi and E.G.Gilbert.Convergence of a generalized SMO algorithm for SVM classifier design.Machine Learning,2002,46(1-3):351-360.
    [40]孙剑,郑南宁,张志华.一种训练支撑向量机的改进贯序最小优化算法.软件学报,2002,13(10):2007-2013.
    [41]李建民,张钹,林福宗.序贯最小优化的改进算法.软件学报,2003,14(5):918-924.
    [42]C.-J.Lin.Asymptotic convergence of an SMO algorithm without any assumptions.IEEE Transactions on Neural Networks,2002,13(1):248-250.
    [43]P.-H.Chen,R.-E.Fan and C.-J.Lin.A study on SMO-type decomposition methods for support vector machines.IEEE Transactions on Neural Networks,2006,17(4):893-908.
    [44]K.P.Bennett and E.J.Bredensteiner.Duality and geometry in SVM classifier.In:Proceedings of the 17th International Conference on Machine Learning(ICML'2000),Stanford,USA,2000, pp.57-64.
    [45]胡正平,吴燕,张晔.基于几何分析的支持向量机快速训练与分类算法.中国图象图形学报,2007,12(1):82-86.
    [46]S.Theodoridis and M.Mavroforkis.Reduced convex hulls:a geometric approach to support vector machines.IEEE Signal Processing Magazine,2007,24(3):119-122.
    [47]周水生,詹海生,周利华.训练支持向量机的Huber近似算法.计算机学报,2005,28(10):1664-1670.
    [48]业宁,孙瑞详,董逸生.多拉格朗日乘子协同优化的SVM快速学习算法研究.计算机研究与发展,2006,43(3):442-448.
    [49]S.R(u|¨)ping.Incremental learning with support vector machines.In:Proceedings of the 2001IEEE International Conference on Data Mining(ICDM'2001),2001,pp.641-642.
    [50]J.Kivinen,A.J.Smola and R.C.Williamson.Online learning with kernels.IEEE Transactions on Signal Processing,2004,52(8):2165-2176.
    [51]B.Sch(o|¨)lkopf,A.J.Smola,R.C.Williamson and P.L.Barlett.New support vector algorithms.Neural Computation,2000,12(5):1207-1245.
    [52]I.Steinwart.On the optimal parameter choice for ν-support vector machines.IEEE Transactions on pattern analysis and machine intelligence,2003,25(10):1274-1284.
    [53]A.Takeda.Generalization performance of ν-support vector classifier based on conditional value-at-risk minimization.Neurocomputing,2009,72(10-12):2351-2358.
    [54]J.A.K.Suykens and J.Vandewalle.Least squares support vector machine classifiers.Neural Processing Letters,1999,9(3):293-300.
    [55]X.Zeng and X.-W.Chen.SMO-based pruning methods for sparse least squares support vector machines.IEEE Transactions on Neural Networks,2005,16(6):1541-1546.
    [56]Y.Li,C.Lin and W.Zhang.Improved sparse least-squares support vector machine classifiers.Neurocomputing,2006,69(13-15):1655-1658.
    [57]Y.J.Lee and O.L.Mangasarian.RSVM:Reduced support vector machines.In:Proceedings of the First SIAM International Conference on Data Mining(SDM'2001),Chicago,USA,2001.
    [58]T.Downs,K.E.Gates and A.Masters.Exact simplification of support vector solutions.Journal of Machine Learning Research,2001,2:293-297.
    [59]K-M.Lin and C.-J.Lin.A study on reduced support vector machines.IEEE Transactions on Neural Networks,2003,14(6):1449-1459.
    [60]D.Nguyen and T.Ho.An efficient method for simplifying support vector machines.In:Proceedings of the 22nd International Conference on Machine Learning(ICML'2005),Bonn,Germany,2005,pp.617-624.
    [61]S.S.Keerthi,O.Chapelle and D.Decoste.Building support vector machines with reduced classifier complexity.Journal of Machine Learning Research,2006,7:1493-1515.
    [62]Q.Li,L.Jiao and Y.Hao.Adaptive simplification of solution for support vector machine.Pattern Recognition,2007,40(3):972-980.
    [63]曾志强,高济.基于向量集约简的精简支持向量机.软件学报,2007,18(11):2719-2727.
    [64]S.Agarwal,V.V.Saradhi and H.Karnick.Kernel-based online machine learning and support vector reduction.Neurocomputing,2008,71(7-9):1230-1237.
    [65]U.H.-G.Kreβel.Pairwise classification and support vector machines.In:Advances in Kernel Methods: Support Vector Learning, MIT Press, 1999, pp.255-268.
    [66] J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In: Proceedings of the 6th European Symposium on Artificial Neural Networks, 1999.
    [67] C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 2002, 13(2): 415-425.
    [68] J. C. Platt, N. Cristianini and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In: Advances in Neural Information Processing Systems (NIPS) 12,1999, pp.547-553.
    [69] V. Vural and J. G. Dy. A hierarchical method for multi-class support vector machines. In: Proceedings of the 21st International Conference on Machine Learning (ICML'2004), Banff, Canada, 2004, pp.105-112.
    [70] Y. Wang and S.-T. Huang. Reducing the number of sub-classifiers for pairwise multi-category support vector machines. Pattern Recognition Letters, 2007, 28(15): 2088-2093.
    [71] 李烨,蔡云泽,尹汝泼,许晓鸣.基于证据理论的多类分类支持向量机集成.计算机研究与发展, 2008, 45(4): 571-578.
    [72] T. Hastie, S. Rosset, R. Tibshirani and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 2004, 5: 1391-1415.
    [73] Y. Dong, Z. Xia and Z. Xia. A two-level approach to choose the cost parameter in support vector machines. Expert Systems with Applications, 2008, 34(2): 1366-1370.
    [74] J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society of London, 1909, Series A 209:415-446.
    [75] N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 1950, 68: 337-404.
    [76] M. Aizerman, E. Braverman and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 1964, 25: 821-837.
    [77] A. J. Smola. Learning with kernels. PhD dissertation, Technische University at Berlin, 1998.
    [78] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.
    [79] C. J. C. Burges and D. J. Crisp. Uniqueness theorems for kernel methods. Neurocomputing, 2003, 55(1-2): 187-220.
    [80] B. Scholkopf. The kernel trick for distances. In: Advances in Neural Information Processing Systems (NIPS) 13, 2000, pp.301-307.
    [81] H. Q. Minh, P. Niyogi and Y. Yao. Mercer's theorem, feature maps, and smoothing. In: Proceedings of the 19th Annual Conference on Learning Theory (COLT'2006), Pittsburgh, PA, USA. ACM Press, 2006, pp. 154-168.
    [82] C. J. C. Burges and V. Vapnik. A new method for constructing artificial neural networks. Interim technical report, ONR contract N00014-94-C-0186. Technical report, AT&T Bell Laboratories, 1995.
    [83] D. Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, Department of Computer Science, University of California in Santa Cruz, 1999.

    [84] C. Watkins. Dynamic alignment kernels. In: Advances in Large Margin Classifiers, MIT Press, 1999,pp.39-50.
    [85] K. Shin and T. Kuboyama. A generalization of Haussler's convolution kernel - mapping kernel. In: Proceedings of the 25th International Conference on Machine Learning (ICML'2008),Helsinki, Finland, 2008, pp.944-951.
    [86] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2002, 2: 419-444.
    [87] S. V. N. Viswanathan and A. Smola. Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems (NIPS) 15, 2002, pp.569-576.
    [88] R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning (ICML'2002), Sydney, Australia, 2002, pp.315-322.
    [89] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In: Proceedings of the 5th International Conference on Data Mining (ICDM'2005), Houston, USA, 2005, pp.71-84.
    [90] F. R. Bach. Graph kernels between point clouds. In: Proceedings of the 25th International Conference on Machine Learning (ICML'2008), Helsinki, Finland, 2008, pp.25-32.
    [91] S. V. N. Vishwanathan, K. M. Borgwardt and N. N. Schraudolph. Fast computation of graph kernels. In: Advances in Neural Information Processing Systems (NIPS) 19, 2006.
    [92]尹传环,田盛丰,牟少敏.一种面向间隙核函数的快速算法.电子学报, 2007, 35(5):875-881.
    [93] C. Yin, S. Tian, S. Mu and C. Shao. Efficient computations of gapped string kernels based on suffix kernel. Neurocomputing, 2008, 71(4-6): 944-962.
    [94] T. S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS) 11, 1998.
    [95] P. J. Moreno and R. Rifkin. Using the Fisher kernel method for Web audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'2000), 2000,4, pp.2417-2420.
    [96] B. Krishnapuram, L. Carin. Support vector machines for improved multiaspect target recognition using the Fisher kernel scores of hidden Markov models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'2002), 2002, 3, pp.2989-2992.
    [97] F. R. Bach, G. R. G. Lanckriet and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the 21st International conference on Machine Learning (ICML'2004), Banff, Canada, 2004.
    [98] S. Sonnenburg, G. Ratsch, C. Schafer and B. Scholkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 2006, 7: 1531-1565.
    [99] A. Rakotomamonjy, F. Bach, S. Canu and Y. Grandvalet. More efficiency in multiple kernel learning. In: Proceedings of the 24th International Conference on Machine Learning (ICML'2007), Corvalis, USA, 2007, pp.775-782.
    [100] Z. Xu, R. Jin, I. King and M. Lyu. An extended level method for efficient multiple kernel learning. In: Advances in Neural Information Processing Systems (NIPS) 21, 2008.
    [101] G. Ratsch, S. Sonnenburg and C. Schafer. Learning interpretable SVMs for biological sequence classification. BMC Bioinformatics, 2006, 7(Supp.1): S9.
    [102] Z. Chen. J. Li and L. Wei. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.Artificial Intelligence in Medicine,2007,41(2):161-175.
    [103]C.Longworth and M.J.F.Gales.Multiple kernel learning for speaker verification.In:IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP'2008),2008,pp.1581-1584.
    [104]吴涛,贺汉根,贺明科.基于插值的核函数构造.计算机学报,2003,26(8):990-996.
    [105]N.Cristianini,C.Campbell and J.Shawe-Taylor.Dynamically adapting kernels in support vector machines.In:Advances in Neural Information Processing Systems(NIPS) 11,1998.
    [106]S.S.Keerthi and C.-J.Lin.Asymptotic behaviors of support vector machines with Gaussian kernel.Neural Computation,2003,15(7):1667-1689.
    [107]O.Chapelle,V.Vapnik and S.Mukherjee.Choosing multiple parameters for support vector machines.Machine Learning,2002,46(1):131-159.
    [108]K.Duan,S.S.Keerthi and A.N.Poo.Evaluation of simple performance measures for tuning SVM hyperparameters.Neurocomputing,2003,51:41-59.
    [109]李晓宇,张新峰,沈兰荪.一种确定径向基核函数参数的方法.电子学报,2005,33(12A):2459-2463.
    [110]V.Vapnik and O.Chapelle.Bounds on error expectation for support vector machines.Neural Computation,2000,12(9):2013-2036.
    [111]K.-M.Chung,W.-C.Kao,T.Sun,L.-L Wang and C.-J.Lin.Radius margin bounds for support vector machines with the RBF kernel,Neural Computation,2003,15(11):2463-2681.
    [112]C.Gold and P.Sollich.Model selection for support vector machine classification.Neurocomputing,2003,55:221-249.
    [113]S.S.Keerthi.Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms.IEEE Transactions on Neural Networks,2002,13(5):1225-1229.
    [114]A.Kulkarni,V.K.Jayaraman and B.D.Kulkarni.Support vector classification with parameter tuning assisted by agent-based technique.Computer and Chemical Engineering,2004,28:311-318.
    [115]常群,王晓龙,林沂蒙,王熙照,Daniel S.Yeung.支持向量分类和多宽度高斯核.电子学报,2007,35(3):484-487.
    [116]N.Cristianini,J.Shawe-Taylor,A.Elisseeff and J.Kandola.On kernel-target alignment.In:Advances in Neural Information Processing Systems(NIPS) 14,2001,pp.367-373.
    [117]Y.Baram.Learning by kernel polarization.Neural Computation,2005,17(6):1264-1275.
    [118]C.H.Nguyen and T.B.Ho.Kernel matrix evaluation.In:Proceedings of the 20th International Joint Conference on Artificial Intelligence(IJCAI'2007),Hyderabad,India,2007,pp.987-992.
    [119]G.R.G.Lanckriet,N.Cristianini,P.Bartlett,L.E.Ghaoui and M.I.Jordan.Learning the kernel matrix with semidefinite programming.Journal of Machine Learning Research,2004,5:27-72.
    [120]刘向东,骆斌,陈兆乾.支持向量机最优模型选择的研究.计算机研究与发展,2005,42(4):576-581.
    [121]J.B.Pothin and C.Richard.A greedy algorithm for optimizing the kernel alignment and the performance of kernel machines.In:Proceedings of the 14th European Signal Processing Conference(EUSIPCO'2006),2006,pp.4-8.
    [122]C.Igel,T.Glasmachers,B.Mersch,N.Pfeifer and P.Meinicke.Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection.IEEE/ACM Transactions on Computational Biology and Bioinformatics,2007,4(2):216-226.
    [123]H.Fr(o|¨)hlich and A.Zell.Efficient parameter selection for support vector machines in classification and regression via model-based global optimization.In:Proceedings of the 2005International Joint Conference on Neural Networks(IJCNN'2005),2005,pp.1431-1436.
    [124]G.Wang,D.-Y.Yeung and F.H.Lochovsky.A kernel path algorithm for support vector machines.In:Proceedings of the 24th International Conference on Machine Learning (ICML'2007),Corvalis,USA,2007,pp.951-958.
    [125]K.-P.Wu and S.-D.Wang.Choosing the kernel parameters for support vector machines by inter-cluster distance in the feature space.Pattern Recognition,2009,42(5):710-717.
    [126]F.Friedrichs and C.Igel.Evolutionary tuning of multiple SVM parameters.Neurocomputing,2005,64(C):107-117.
    [127]M.Boardman and T.Trappenberg.A heuristic for free parameter optimization with support vector machines.In:Proceedings of the 2006 International Joint Conference on Neural Networks(IJCNN'2006),Vancouver,Canada,2006,pp.610-617.
    [128]X.Peng,H.Wu and Y.Peng.Parameter selection method for SVM with PSO.Chinese Journal of Electronics,2006,15(4):638-642.
    [129]常群,王晓龙,林沂蒙,Daniel S.Yeung,陈清才.通过全局核降低高斯核的局部风险与基于遗传算法的两阶段模型选择.计算机研究与发展,2007,44(3):439-444.
    [130]T.Joachims.Text categorization with support vector machines:learning with many relevant features.In:Proceedings of the 10th European Conference on Machine Learning (ECML' 1998),Chemnitz,Germany,1998,pp.137-142.
    [131]S.Tong and D.Koller.Support vector machine active learning with applications to text classification.Journal of Machine Learning Research,2002,2:45-66.
    [132]K.Jonsson,J.Kittler,Y.P.Li and J.Matas.Support vector machines for face authentication.Image and Vision Computing,2002,20(5-6):369-375.
    [133]K.Hotta.Robust face recognition under partial occlusion based on support vector machine with local Gaussian summation kernel.Image and Vision Computing,2008,26(11):1490-1498.
    [134]I.Guyon,J.Weston,S.Barnhill and V.Vapnik.Gene selection for cancer classification using support vector machines.Machine Learning,2002,46(1-3):389-422.
    [135]李玉岗,张法,刘志勇.结合位点进化距离与支持向量机的蛋白质分类方法.计算机学报,2008,31(1):43-50.
    [136]饶鲜,董春曦,杨绍全.基于支持向量机的入侵检测系统.软件学报,2003,14(4):798-803.
    [137]S.Tian,S.Mu and C.Yin.Sequence-similarity kernels for SVMs to detect anomalies in system calls.Neurocomputing,2007,70(4-6):859-866.
    [138]S.Tian,S.Mu and C.Yin.Length-weighted string kernels for sequence data classification.Pattern Recognition Letters,2007,28(13):1651-1656.
    [139]C.Yin,S.Tian and S.Mu.High-order Markov kernels for intrusion detection. Neurocomputing,2008,71(16-18):3247-3252.
    [140]N.Cristianini,J.Shawe-Taylor and H.Lodhi.Latent semantic kernels.In:Proceedings of the 18th International Conference on Machine Learning(ICML'2001),Williamstown,USA,2001,pp.66-73.
    [141]S.Bloehdorn,R.Basili,M.Cammisa and A.Moschitti.Semantic kernels for text classification based on topological measures of feature similarity.In:Proceeding of the 6th IEEE International Conference on Data Mining(ICDM'2006),Hong Kong,China,pp.808-812.
    [142]P.Wang and C.Domeniconi.Building semantic kernels for text classification using Wikipedia.In:Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'2008),Las Vegas,USA,2008,pp.713-721.
    [143]N.Cancedda,E.Gaussier,C.Goutte and J.-M.Renders.Word sequence kernels.Journal of Machine Learning Research,2003,3:1059-1082.
    [144]王珏,周志华,周傲英.机器学习及其应用.北京:清华大学出版社,2006.
    [145]R.O.Duda,P.E.Hart and D.G.Stork著.李宏东,姚天翔等译.模式分类(原书第二版).北京:机械工业出版社,2003.
    [146]张学工.关于统计学习理论与支持向量机.自动化学报,2000,26(1):32-42.
    [147]T.M.Cover.Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition.IEEE Transactions on Electronic Computers,1965,14(4):326-337.
    [148]H.-T.Lin and C.-J.Lin.A study on Sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods,http://www.csie.ntu.edu.tw/-cjlin/papers.html,2003.
    [149]朱永生,王成栋,张优云.二次损失函数支持向量机性能的研究.计算机学报,2003,26(8):982-989.
    [150]A.L.Blum and P.Langley.Selection of relevant features and examples in machine learning.Artificial Intelligence,1997,97(1-2):245-271.
    [151]X.Wang,Y.Wang and L.Wang.Improving fuzzy c-means clustering based on feature-weight learning.Pattern Recognition Letters,2004,25(10):1123-1132.
    [152]H.-G.Chew,D.J.Crisp,R.E.Bogner and C.-C.Lim.Target detection in radar imagery using support vector machines with training size biasing.In:Proceedings of the 6th International Conference on Control,Automation,Robotics and Vision(ICARCV'2000),Singapore,2000.
    [153]H.-G.Chew,R.E.13ogner and C.-C.Lim.Dual ν-support vector machine with error rate and training size biasing.In:IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP'2001),2001,pp.1269-1272.
    [154]X.Wu and R.Srihari.New ν-support vector machines and their sequential minimal optimization.In:Proceedings of the 20th International Conference on Machine Learning (ICML'2003),Washington DC,USA,2003,pp.824-831.
    [155]王娜,李霞.基于类加权的双ν支持向量机.电子与信息学报,2007,29(4):859-862.
    [156]范昕炜,杜树新,吴铁军.可补偿类别差异的加权支持向量机算法.中国图像图形学报,2003,8(9):1037-1042.
    [157]C.-F.Lin and S.-D.Wang.Fuzzy support vector machines.IEEE Transactions on Neural Networks,2002,13(2):464-471.
    [158]C.-F.Lin and S.-D.Wang.Training algorithms for fuzzy support vector machines with noisy data.Pattern Recognition Letters,2004,25(14):1647-1656.
    [159]张翔,肖小玲,徐光佑.基于样本之间紧密度的模糊支持向量机方法.软件学报,2006,17(5):951-958.
    [160]X.Jiang,P.Yi and J.C.Lv.Fuzzy SVM with a new fuzzy membership function.Neural Computing and Applications,2006,15(3):268-276.
    [161]C.-C.Hsu,M.-F.Han,S.-H.Chang and H.-Y.Chung.Fuzzy support vector machines with the uncertainty of parameter C.Expert Systems with Applications,2009,36(3):6654-6658.
    [162]李昆仑,黄厚宽,田盛丰.模糊多类SVM模型.电子学报,2004,32(5):830-832.
    [163]赵晖,荣莉莉.支持向量机组合分类及其在文本分类中的应用.小型微型计算机系统,2005,26(10):1816-1820.
    [164]D.S.Yeung and X.Z.Wang.Improving performance of similarity-based clustering by feature weight learning.IEEE Transactions on Pattern Analysis and Machine Learning,2002,24(4):556-561.
    [165]李洁,高新波,焦李成.基于特征加权的模糊聚类新算法.电子学报,2006,34(1):89-92.
    [166]J.Han and M.Kamber著.范明,孟小峰译.数据挖掘:概念与技术(原书第二版).北京:机械工业出版社,2007.
    [167]A.Asuncion and D.J.Newman.UCI Machine Learning Repository.Available at < http://archive.ics.uci.edu/ml>,2007.
    [168]C.-C.Chang and C.-J.Lin.LIBSVM:a library for support vector machines.Software available at < http://www.csie.ntu.tw/-cjlin/libsvm>,2007.
    [169]B.Sch(o|¨)lkopf and A.Smola.Learning with kernels.Cambridge,MA:MIT Press,2002.
    [170]V.Vapnik.Universal learning technology:support vector machines.NEC Journal of Advanced Technology,2005,2(2):137-144.
    [171]T.Glasmachers and C.Igel.Gradient-based adaptation of general Gaussian kernels.Neural Computation,2005,17(10):2099-2105.
    [172]G.Cauwenberghs and T.Poggio.Incremental and decremental support vector machine learning.In:Advances in Neural Information Processing Systems(NIPS) 13,2000,pp.409-415.
    [173]K.Tsuda,G.R(a|¨)tsch,S.Mika and K.-R.M(u|¨)ller.Learning to predict the leave-one-out error of kernel based classifiers.In:Proceedings of the Internationai Conference on Artificial Neural Networks(1CANN'2001),2001,pp.331-338.
    [174]M.M.S.Lee,S.S.Keerthi,C.J.Ong and D.DeCoste.An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels.IEEE Transactions on Neural Networks,2004,15(3):750-757.
    [175]K,Li.Scalable parallel matrix multiplication on distributed memory parallel computers.Journal of Parallel and Distributed Computing,2001,61(12):1709-1731.
    [176]K.Goto and R.V.D.Geijn.Anatomy of high-performance matrix multiplication.ACM Transactions on Mathematical Software,2008,34(3):12-25.
    [177]I.Guyon and A.Elisseef.An introduction to variable and feature selection.Journal of Machine Learning Research,2003,3:1157-1182.
    [178]R.Kohavi and G.H.John.Wrappers for feature selection.Artificial Intelligence,1997,97(1-2):273-324.
    [179]J.Weston,S.Mukherjee,O.Chapelle,M.Pontil,T.Poggio and V.Vapnik.Feature selection for SVMs.In:Advances in Neural Information Processing Systems(NIPS) 13,2000,pp.668-674.
    [180]Y.Grandvalet and S.Canu.Adaptive scaling for feature selection in SVMs.In:Advances in Neural Information Processing Systems(NIPS) 15,2002.
    [181]A.Rakotomamonjy.Variable selection using SVM-based criteria.Journal of Machine Learning Research,2003,3:1357-1370.
    [182]K.-Z,Mao.Feature subset selection for support vector machines through discriminative function pruning analysis.IEEE Transactions on System,Man,and Cybernetics-Part B:Cybernetics,2004,34(1):60-67.
    [183]Y.Liu and Y.F.Zheng.FS_SFS:A novel feature selection method for support vector machines.Pattern Recognition,2006,39(7):1333-1345.
    [184]乔立岩,彭喜元,彭宇.基于微粒群算法和支持向量机的特征子集选择立法.电子学报,2006,34(3):496-498.
    [185]R.P.W.Duin,P.Juszczak,P.Paclik,E.Pekalska,D.de Ridderet and D.M.J.Tax.PRTools:A matlab toolbox for pattern recognition.Delft University of Technology.Software available at < http://www.prtools.org>,2004.
    [186]P.Brazdil.Statlog Dataset.Available at < http://www.liacc.up.pt/ML/old/statlog>,2005.
    [187]J.Weston,A.Elisseeff,B.Sch(o|¨)lkopf and M.Tipping.Use of the zero-norm with linear models and kernel methods.Journal of Machine Learning Research,2003,3:1439-1461.
    [188]C.-W.Hsu,C.-C.Chang and C.-J.Lin.A practical guide to support vector classification.Available at < http://www.csie.ntu.edu.tw/-cjlin/papers/guide/guide.pdf>,2007.
    [189]S.Ali and K.A.Smith-Miles.A meta-learning approach to automatic kernel selection for support vector machines.Neurocomputing,2006,70(1-3):173-186.
    [190]王泳,胡包钢.应用统计方法综合评估核函数分类能力的研究.计算机学报,2008,31(6):942-952.
    [191]R.A.Fisher.The use of multiple measurements in taxonomic problems.Annals of Eugenics,1936,7(2):179-188.
    [192]K.Fukunaga.Introduction to Statistical Pattern Recognition,second edition.Boston:Academic Press,1990.
    [193]C.H.Nguyen and T.B.Ho.An efficient kernel matrix evaluation measure.Pattern Recognition,2008,41(11):3366-3372.
    [194]X.He and P.Niyogi.Locality preserving projections.In:Advances in Neural Information Processing Systems(NIPS) 16,2003,pp.153-160.
    [195]X.He,S.Yan,Y.Hu.P.Niyogi and H.-J.Zhang.Face recognition using laplacianfaces.IEEE Transactions on Pattern Analysis and Machine Learning,2005,27(3):328-340.
    [196]M.Sugiyama.Local Fisher discriminant analysis for supervised dimensionality reduction.In:Proceedings of the 23rd International Conference on Machine Learning(ICML'2006),Pittsburgh,USA,2006,pp.905-912.
    [197]申中华,潘水惠,王士同.有监督的局部保留投影降维算法.模式识别与人工智能,2008,21(2):233-239.
    [198]L.Zelnik-Manor and P.Perona.Self-tuning spectral clustering.In:Advances in Neural Information Processing Systems (NIPS) 17, 2004, pp.1601-1608.

    [199]牟少敏,田盛丰,尹传环.基于协同聚类的多核学习.北京交通大学学报：自然科学版,2008, 32(2): 10-13.
    [200] M. Szafranski, Y. Grandvalet and A. Rakotomamonjy. Composite kernel learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML'2008), Helsinki, Finland, 2008, pp. 1040-1047.
    [201] M. Gonen and E. Alpaydin. Localized multiple kernel learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML'2008), Helsinki, Finland, 2008,pp.352-359.
    [202] S. Ji, L. Sun, R. Jin and J. Ye. Multi-label multiple kernel learning. In: Advances in Neural Information Processing Systems (NIPS) 21, 2008.
    [203] A. Zien and C. S. Ong. Multiclass multiple kernel learning. In: Proceedings of the 24th International Conference on Machine Learning (ICML'2007), Corvalis, USA, 2007, pp.1191-1198.
    [204] A. C. Lorena and A. C.P.L.F. de Carvalho. Evolutionary tuning of SVM parameter values in multiclass problems. Neurocomputing, 2008, 71(16-18): 3326-3334.
    [205] L. Wang, P. Xue and K. L. Chan. Two criteria for model selection in multiclass support vector machines. IEEE Transactions on System, Man, and Cybernetics-Part B: Cybernetics, 2008, 38(6): 1432-1448.
    [206] G. R. G. Lanckriet, L. E. Ghaoui, C. Bhattacharyya and M. I. Jordan. A Robust minimax approach to classification. Journal of Machine Learning Research, 2003, 3: 555-582.
    [207] E. P. Xing, A. Y. Ng, M. I. Jordan and S. Russell. Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems (NIPS) 15, 2002.
    [208] A. Globerson and S. Roweis. Metric learning by collapsing classes. In: Advances in Neural Information Processing Systems (NIPS) 18,2005, pp.451-458.
    [209] L. Wang. Feature selection with kernel class separability. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(9): 1534-1546.
    [210] S. Liao and L. Jia. Simultaneous tuning of hyperparameter and parameter for support vector machines. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'2007), Nanjing, China, 2007, pp.162-172.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700