用户名: 密码: 验证码:
标签带噪声数据的重加权半监督分类方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Reweighting Semi-Supervised Classification for Noisy Labels
  • 作者:陈倩 ; 杨旻 ; 魏鹏飞
  • 英文作者:CHEN Qian;YANG Min;WEI Peng-fei;School of Mathematics and Information Sciences,Yantai University;
  • 关键词:重要性重加权 ; 噪声率 ; 监督分类 ; 概率估计
  • 英文关键词:importance reweighting;;noise rate;;semi-supervised classification;;probability estimation
  • 中文刊名:YTSZ
  • 英文刊名:Journal of Yantai University(Natural Science and Engineering Edition)
  • 机构:烟台大学数学与信息科学学院;
  • 出版日期:2019-07-05
  • 出版单位:烟台大学学报(自然科学与工程版)
  • 年:2019
  • 期:v.32;No.118
  • 基金:国家自然科学基金资助项目(11771257);; 山东省自然科学基金资助项目(ZR2018MA008)
  • 语种:中文;
  • 页:YTSZ201903001
  • 页数:5
  • CN:03
  • ISSN:37-1213/N
  • 分类号:4-8
摘要
对于仅有部分数据带标签且标签含有噪声的二分类问题,提出了一类基于重要性重加权的半监督分类算法,借助贝叶斯公式和无约束最小二乘拟合进行标签噪声率的估计,并由此利用BP神经网络逐步求解带权的优化问题,在多个标准数据集上的实验结果表明,本文提出重加权的半监督分类方法,能有效地降低标签不足以及标签噪声对分类准确率的影响.
        A semi-supervised classification algorithm based on importance reweighting is proposed for a two-class problem,that only a few data contain noisy labels. The Bayesian formula and unconstrained least squares fitting are used to estimate the noise rate. BP neural network is then used to solve the weighted optimization problem step by step. The experimental results on multiple benchmark sets show that the proposed method can reduce the impact on classification accuracy originated from the label insufficiency and noise.
引文
[1] ANGLUIN D,LAIRD P D. Learning from noisy examples[J]. Machine Learning,1988,2(4):343-370.
    [2] ASLAM J A,DECATUR S E. On the sample complexity of noise-tolerant learning[J]. Information Processing Letters,1996,57(4):189-195.
    [3] KEARNS M J. Efficient noise-tolerant learning from statistical queries[J]. Journal of the ACM,1998,45(6):983-1006.
    [4] FRENAY B,VERLEYSEN M. Classification in the presence of label noise:A survey[J]. IEEE Transactions on Neural Networks,2014,25(5):845-869.
    [5] SCOTT C. A Rate of Convergence for Mixture Proportion Estimation with Application to Learning from Noisy Labels[C].San Diego:International Conference on Artificial Intelligence and Statistics,2015:838-846.
    [6] LIU T,TAO D. Classification with noisy labels by importance reweighting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(3):447-461.
    [7] ANEES A,ARYAL J,OREILLY M M,et al. A relative density ratio-based framework for detection of land cover changes in MODIS NDVI time series[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2016,9(8):3359-3371.
    [8] SUGIYAMA M,NAKAJIMA S,KASHIMA H,et al. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation[C]. Vancouver:Neural Information Processing Systems,2007:1433-1440.
    [9] PETER H.机器学习实战[M].李锐,李鹏,曲亚东,等译.北京:人民邮电出版社,2013.
    [10]李航.统计学习方法[M].北京:清华大学出版社,2012.
    [11]仇上正,张曦煌.一种改进的基于核密度估计的DPC算法[J].计算机应用与软件,2017(12):284-288.
    [12] YAMADA M,SUZUKI T,KANAMORI T,et al. Relative density-ratio estimation for robust distribution comparison[J]. Neural Computation,2013,25(5):1324-1370.
    [13]刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015(8):1592-1617.
    [14] ROSENBERG C,HEBERT M,SCHNEIDERMAN H,et al. Semi-Supervised Self-Training of Object Detection Models[C]. New York:IEEE Workshop on Applications of Computer Vision. 2005:29-36.
    [15] GUNNAR RAETSCH. Gunnar Raetsch's benchmark datasets[DB]. Computational Biology Laboratory,UEA,2004-2005.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700