摘要
对于仅有部分数据带标签且标签含有噪声的二分类问题,提出了一类基于重要性重加权的半监督分类算法,借助贝叶斯公式和无约束最小二乘拟合进行标签噪声率的估计,并由此利用BP神经网络逐步求解带权的优化问题,在多个标准数据集上的实验结果表明,本文提出重加权的半监督分类方法,能有效地降低标签不足以及标签噪声对分类准确率的影响.
A semi-supervised classification algorithm based on importance reweighting is proposed for a two-class problem,that only a few data contain noisy labels. The Bayesian formula and unconstrained least squares fitting are used to estimate the noise rate. BP neural network is then used to solve the weighted optimization problem step by step. The experimental results on multiple benchmark sets show that the proposed method can reduce the impact on classification accuracy originated from the label insufficiency and noise.
引文
[1] ANGLUIN D,LAIRD P D. Learning from noisy examples[J]. Machine Learning,1988,2(4):343-370.
[2] ASLAM J A,DECATUR S E. On the sample complexity of noise-tolerant learning[J]. Information Processing Letters,1996,57(4):189-195.
[3] KEARNS M J. Efficient noise-tolerant learning from statistical queries[J]. Journal of the ACM,1998,45(6):983-1006.
[4] FRENAY B,VERLEYSEN M. Classification in the presence of label noise:A survey[J]. IEEE Transactions on Neural Networks,2014,25(5):845-869.
[5] SCOTT C. A Rate of Convergence for Mixture Proportion Estimation with Application to Learning from Noisy Labels[C].San Diego:International Conference on Artificial Intelligence and Statistics,2015:838-846.
[6] LIU T,TAO D. Classification with noisy labels by importance reweighting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(3):447-461.
[7] ANEES A,ARYAL J,OREILLY M M,et al. A relative density ratio-based framework for detection of land cover changes in MODIS NDVI time series[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2016,9(8):3359-3371.
[8] SUGIYAMA M,NAKAJIMA S,KASHIMA H,et al. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation[C]. Vancouver:Neural Information Processing Systems,2007:1433-1440.
[9] PETER H.机器学习实战[M].李锐,李鹏,曲亚东,等译.北京:人民邮电出版社,2013.
[10]李航.统计学习方法[M].北京:清华大学出版社,2012.
[11]仇上正,张曦煌.一种改进的基于核密度估计的DPC算法[J].计算机应用与软件,2017(12):284-288.
[12] YAMADA M,SUZUKI T,KANAMORI T,et al. Relative density-ratio estimation for robust distribution comparison[J]. Neural Computation,2013,25(5):1324-1370.
[13]刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015(8):1592-1617.
[14] ROSENBERG C,HEBERT M,SCHNEIDERMAN H,et al. Semi-Supervised Self-Training of Object Detection Models[C]. New York:IEEE Workshop on Applications of Computer Vision. 2005:29-36.
[15] GUNNAR RAETSCH. Gunnar Raetsch's benchmark datasets[DB]. Computational Biology Laboratory,UEA,2004-2005.