用户名: 密码: 验证码:
基于机器学习的混合式特征选择算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Mixed feature selection method based on machine learning
  • 作者:雷海锐 ; 高秀峰 ; 刘辉
  • 英文作者:Lei Hairui;Gao Xiufeng;Liu Hui;Shijiazhuang Campus of Army Engineering University;Military Representative Office of PLA Stayed in 5413 Factory;
  • 关键词:特征选择 ; 信息增益比 ; 对称不确定性 ; CFS ; 决策树
  • 英文关键词:feature selection;;information gain ratio;;symmetrical uncertainly;;CFS;;decision tree
  • 中文刊名:DZCL
  • 英文刊名:Electronic Measurement Technology
  • 机构:陆军工程大学石家庄校区装备模拟训练中心;中国人民解放军驻五四一三厂军事代表室;
  • 出版日期:2018-08-23
  • 出版单位:电子测量技术
  • 年:2018
  • 期:v.41;No.300
  • 语种:中文;
  • 页:DZCL201816009
  • 页数:5
  • CN:16
  • ISSN:11-2175/TN
  • 分类号:48-52
摘要
针对CFS算法中优先选择数量较多的特征属性的问题,引入了信息增益比和对称不确定性的改进CFS算法。同时,为了进一步降低特征维数,提高分类效率,提出一种基于filter-wrapper模型的混合式特征选择方法,先采用改进后的CFS算法过滤无关特征;再基于精简子集,采用封装式选择中的序列后向搜索算法,结合决策树选取最优子集。仿真实验表明,采用该方法选择的特征子集具有更好的分类能力,同时发现该方法在不同的分类模型中泛化能力也有着不同的表现。
        Aiming at the problem that CFS algorithm prefers for the feature which has a large numbers,CFS algorithm with improved information gain ratio and symmetric uncertainty is introduced.In order to further reduce the feature dimension and improve the classification efficiency,a hybrid feature selection method based on filter-wrapper model is proposed.Firstly,the irrelevant features are filtered by the improved CFS algorithm.Then,combining decision tree,the sequential backward selection algorithm in the wrapper selection mode was used to select the optimal feature subset from the simplified feature subset.Simulation results show that the feature subset selected by this method has better classification ability,meanwhile it is found that the generalization ability of the method in different classification models also has different performance.
引文
[1]林闯,李寅,万剑雄.计算机网络服务质量优化方法研究综述[J].计算机学报,2011,34(1):1-14.
    [2]王涛,余顺争.基于机器学习的网络流量分类研究进展[J].小型微型计算机系统,2012,33(5):1034-1040.
    [3]ZHANG Y,YANG A,XIONG C,et al.Feature selection using data envelopment analysis[J].Knowledge-Based Systems,2014,64(64):70-80.
    [4]崔文玲,潘静,何改云,等.基于类心和特征加权的特征选择算法[J].电子测量技术,2015,38(3):26-29.
    [5]董亚楠,刘学军,李斌.一种基于用户行为特征选择的点击欺诈检测方法[J].计算机科学,2016,43(10):145-149.
    [6]汪文勇,刘川,赵强,等.直接验证的封装式特征选择方法[J].电子科技大学学报,2016,45(4):607-615.
    [7]武小年,彭小金,杨宇洋,等.入侵检测中基于SVM的两级特征选择方法[J].通信学报,2015,36(4):23-30.
    [8]孙兴斌,孙彦赞,郑小盈,等.面向多类不均衡网络流量的特征选择方法[J].计算机应用研究,2017,34(2):568-571,594.
    [9]MAHALINGAM P R,VIVEK S.Predicting financial savings decisions using sigmoid function and information gain ratio☆[J].Procedia Computer Science,2016,93:19-25.
    [10]MURSALIN M,ZHANG Y,CHEN Y,et al.Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier[J].Neurocomputing,2017,241(C):204-214.
    [11]李德有,李凌霞,郭瑞波.基于Weka平台的机器学习方法探究[J].电脑知识与技术,2012,8(10):2334-2337.
    [12]BALAMURUGAN S A A,RAJARAM R.Effective and efficient feature selection for large-scale data using Bayes’theorem[J].International Journal of Automation&Computing,2009,6(1):62-71.
    [13]ZHU A.A P2Pnetwork traffic classification method based on C4.5 decision tree algorithm[J].Lecture Notes in Electrical Engineering,2014:373-379.
    [14]MALDONADO S,WEBER R.A wrapper method for feature selection using Support Vector Machines[J].Information Sciences,2009,179(13):2208-2217.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700