用户名: 密码: 验证码:
粗糙集的Mallows C_p选择算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Mallow's C_p Selection Algorithm for Rough Set
  • 作者:杨贵军 ; 于洋
  • 英文作者:YANG Guijun;YU Yang;School of Statistics, Tianjin University of Finance and Economics;
  • 关键词:Mallow’s ; Cp准则 ; Logistic模型 ; 模型选择 ; 粗糙集 ; 泛化能力
  • 英文关键词:Mallow's Cp criterion;;Logistic model;;model selection;;rough set;;generalization ability
  • 中文刊名:KXTS
  • 英文刊名:Journal of Frontiers of Computer Science and Technology
  • 机构:天津财经大学统计学院;
  • 出版日期:2018-08-29 14:25
  • 出版单位:计算机科学与探索
  • 年:2019
  • 期:v.13;No.126
  • 基金:国家自然科学基金11471239;; 重庆市社会科学规划重大委托项目2016WT03;; 全国统计科学研究重点项目2017LZ25;; 天津财经大学研究生科研资助计划项目2016TCB03~~
  • 语种:中文;
  • 页:KXTS201903018
  • 页数:8
  • CN:03
  • ISSN:11-5602/TP
  • 分类号:165-172
摘要
粗糙集选择是粗糙集实证研究中的关键步骤。目前常用的粗糙集择优标准是误判率。考虑到误判率准则未考察粗糙集的复杂度,存在过拟合风险,在测试集中误判率小的粗糙集不一定具有最强的泛化能力,引入Mallow’s C_p准则作为一种新粗糙集选择标准。粗糙集的Mallow’s C_p选择算法通过Logistic模型将非线性的粗糙集分类规则表达为线性形式,Logistic模型的C_p值作为粗糙集的C_p值,根据C_p值进行粗糙集择优。实际应用显示,粗糙集的Mallow’s C_p选择算法能够筛选出泛化能力强的粗糙集,相较误判率准则选出泛化能力强的粗糙集的频率更高。特别当多个粗糙集的误判率差异小时,新算法更可能选出泛化能力强的粗糙集。粗糙集的Mallow’s C_p选择算法兼顾了粗糙规则的分类准确性与复杂度,能够更好地选择泛化能力强的粗糙集。
        Rough set selection is a key step in empirical research of rough sets. Misclassification rate is often used as an optimal criterion of rough set evaluation. In view that the misclassification rate criterion does not consider the complexity of the rough set, thus there is over-fitting risk, and the rough set with the least misclassification rate in a test set does not always have the best generalization ability, the Mallow's C_p criterion is introduced as a new rough set selection criterion. The Mallow's C_p selection algorithm for rough set expresses the nonlinear rough set classification rules as linear form by Logistic model, the C_p value of the rough set is defined as the C_p value of the Logistic model, and rough set is selected according to C_p value. Empirical research results show that the Mallow's C_p selection algorithm for rough set can choose out rough set with better generalization ability, and the selection frequency of rough set with best generalization ability is higher than misclassification rate criterion. Especially when there is small difference of misclassification rate among rough sets, new approach is more likely to choose rough set with the best generalization ability than misclassification criterion. The Mallow's C_p selection algorithm for rough set combines the classification accuracy and complexity of rough rules and is better at choosing rough set with the best generalization ability.
引文
[1] Pawlak Z. Rough sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5):341-356.
    [2] Tay F E H, Shen L. Economic and financial prediction using rough sets model[J]. European Journal of Operational Research,2002, 141(3):641-659.
    [3] Mao T T, Xiao K, Zou K. A research on multiple-indicators comprehensive evaluation method based on rough set and conditional information entropy[J]. Statistical Research, 2014,31(7):92-96.
    [4] Cao L X, Huang G Q. Research of rough game model and algorithm on third-party payment platform transaction[J].Operations Research and Management Science, 2016, 25(5):46-53.
    [5] Duan J, Hu Q H, Zhang L J, et al. Feature selection for multi-label classification based on neighborhood rough sets[J]. Journal of Computer Research and Development, 2015,52(1):56-65.
    [6] Li Y L, Chang Z Q, Yang H. Multi label web recommendation based on Gauss-PNN rough set expectation[J]. Application Research of Computers, 2017, 34(2):382-385.
    [7] Zhou Z H. Machine learning[M]. Beijing:Tsinghua University Press, 2016.
    [8] Jaworski W. Rule induction:combining rough set and statistical approaches[C]//LNCS 5306:Proceedings of the 6th International Conference on Rough Sets&Current Trends in Computing, Akron, Oct 23-25, 2008. Berlin, Heidelberg:Springer, 2008:170-180.
    [9] Cornelis C, Jensen R, Hurtado G, et al. Attribute selection with fuzzy decision reducts[J]. Information Sciences, 2010,180(2):209-224.
    [10] Zhang W, Miao D Q, Gao C, et al. Semi-supervised data attribute reduction based on rough-subspace ensemble learning[J]. Journal of Chinese Computer Systems, 2016, 37(12):2727-2732.
    [11] Liu C, Qin L X. Test-cost sensitive reduction on positive region of decision theoretic rough sets[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(6):1014-1020.
    [12] Xu J F, Miao D Q, Zhang Y J. Three-way decisions model for multi-object optimization based on confusion matrix[J].Pattern Recognition and Artificial Intelligence, 2017, 30(9):859-864.
    [13] Yang G J, Yu Y, Meng J. Selection of better rough set based on AIC[J]. Fuzzy Systems and Mathematics, 2018, 32(1):165-171.
    [14] Yang G J, Yu Y. Selection of rough set and attribute reduction based on BIC[J]. Statistics&Information Forum, 2018, 33(3):3-9.
    [15] Hansen B. Least squares model averaging[J]. Econometrica,2007, 75(4):1175-1189.
    [16] Zhang X Y, Zou G H. Model averaging method and its application in forecast[J]. Statistical Research, 2011, 28(6):97-102.
    [17] Chen Q X, Liu D, Liang D C. Improved AHP approach based on rough set theory and information entropy[J].Journal of Frontiers of Computer Science and Technology,2018, 12(3):484-493.
    [18] Li J, Wang L D. Decision-theoretic rough sets on two universes within incomplete information system:from the view of double relative quantitative information[J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(4):653-661.
    [19] Fan X W. A study on qualitative data analysis based on rough sets[D]. Xiamen:Xiamen University, 2008.
    [20] Olejnik S, Mills J, Keselman H. Using wherry’s adjusted R2and mallow’s Cpfor model selection from all possible regressions[J]. The Journal of Experimental Education,2000, 68(4):365-380.
    [3]毛太田,肖锎,邹凯.一种基于粗糙集条件信息熵的多指标综合评价方法研究[J].统计研究, 2014, 31(7):92-96.
    [4]曹黎侠,黄光球.第三方支付平台交易的粗糙博弈模型及算法研究[J].运筹与管理, 2016, 25(5):46-53.
    [5]段洁,胡清华,张灵均,等.基于邻域粗糙集的多标记分类特征选择算法[J].计算机研究与发展, 2015, 52(1):56-65.
    [6]李又玲,常致全,杨浩.多标签网页的Gauss-PNN粗糙集排序推荐[J].计算机应用研究, 2017, 34(2):382-385.
    [7]周志华.机器学习[M].北京:清华大学出版社, 2016.
    [10]张维,苗夺谦,高灿,等.基于粗糙集成学习的半监督属性约简[J].小型微型计算机系统, 2016, 37(12):2727-2732.
    [11]刘偲,秦亮曦.测试代价敏感的决策粗糙集正域约简[J].计算机科学与探索, 2017, 11(6):1014-1020.
    [12]徐健锋,苗夺谦,张远健.基于混淆矩阵的多目标优化三支决策模型[J].模式识别与人工智能, 2017, 30(9):859-864.
    [13]杨贵军,于洋,孟杰.基于AIC的粗糙集择优方法[J].模糊系统与数学, 2018, 32(1):165-171.
    [14]杨贵军,于洋.基于BIC的粗糙集择优和属性约简[J].统计与信息论坛, 2018, 33(3):3-9.
    [16]张新雨,邹国华.模型平均方法及其在预测中的应用[J].统计研究, 2011, 28(6):97-102.
    [17]陈覃霞,刘盾,梁德翠.粗糙集理论和信息熵的AHP改进方法[J].计算机科学与探索, 2018, 12(3):484-493.
    [18]李敬,王利东.面向不完备信息系统的双论域决策粗糙集——基于双相对量化信息的角度[J].计算机科学与探索, 2018, 12(4):653-661.
    [19]范霄文.基于粗糙集的定性数据分析方法研究[D].厦门:厦门大学, 2008.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700