摘要
针对CFS算法中优先选择数量较多的特征属性的问题,引入了信息增益比和对称不确定性的改进CFS算法。同时,为了进一步降低特征维数,提高分类效率,提出一种基于filter-wrapper模型的混合式特征选择方法,先采用改进后的CFS算法过滤无关特征;再基于精简子集,采用封装式选择中的序列后向搜索算法,结合决策树选取最优子集。仿真实验表明,采用该方法选择的特征子集具有更好的分类能力,同时发现该方法在不同的分类模型中泛化能力也有着不同的表现。
Aiming at the problem that CFS algorithm prefers for the feature which has a large numbers,CFS algorithm with improved information gain ratio and symmetric uncertainty is introduced.In order to further reduce the feature dimension and improve the classification efficiency,a hybrid feature selection method based on filter-wrapper model is proposed.Firstly,the irrelevant features are filtered by the improved CFS algorithm.Then,combining decision tree,the sequential backward selection algorithm in the wrapper selection mode was used to select the optimal feature subset from the simplified feature subset.Simulation results show that the feature subset selected by this method has better classification ability,meanwhile it is found that the generalization ability of the method in different classification models also has different performance.
引文
[1]林闯,李寅,万剑雄.计算机网络服务质量优化方法研究综述[J].计算机学报,2011,34(1):1-14.
[2]王涛,余顺争.基于机器学习的网络流量分类研究进展[J].小型微型计算机系统,2012,33(5):1034-1040.
[3]ZHANG Y,YANG A,XIONG C,et al.Feature selection using data envelopment analysis[J].Knowledge-Based Systems,2014,64(64):70-80.
[4]崔文玲,潘静,何改云,等.基于类心和特征加权的特征选择算法[J].电子测量技术,2015,38(3):26-29.
[5]董亚楠,刘学军,李斌.一种基于用户行为特征选择的点击欺诈检测方法[J].计算机科学,2016,43(10):145-149.
[6]汪文勇,刘川,赵强,等.直接验证的封装式特征选择方法[J].电子科技大学学报,2016,45(4):607-615.
[7]武小年,彭小金,杨宇洋,等.入侵检测中基于SVM的两级特征选择方法[J].通信学报,2015,36(4):23-30.
[8]孙兴斌,孙彦赞,郑小盈,等.面向多类不均衡网络流量的特征选择方法[J].计算机应用研究,2017,34(2):568-571,594.
[9]MAHALINGAM P R,VIVEK S.Predicting financial savings decisions using sigmoid function and information gain ratio☆[J].Procedia Computer Science,2016,93:19-25.
[10]MURSALIN M,ZHANG Y,CHEN Y,et al.Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier[J].Neurocomputing,2017,241(C):204-214.
[11]李德有,李凌霞,郭瑞波.基于Weka平台的机器学习方法探究[J].电脑知识与技术,2012,8(10):2334-2337.
[12]BALAMURUGAN S A A,RAJARAM R.Effective and efficient feature selection for large-scale data using Bayes’theorem[J].International Journal of Automation&Computing,2009,6(1):62-71.
[13]ZHU A.A P2Pnetwork traffic classification method based on C4.5 decision tree algorithm[J].Lecture Notes in Electrical Engineering,2014:373-379.
[14]MALDONADO S,WEBER R.A wrapper method for feature selection using Support Vector Machines[J].Information Sciences,2009,179(13):2208-2217.