摘要
多示例多标签学习框架是一种针对解决多义性问题而提出的新型机器学习框架,在多示例多标签学习框架中,一个对象是用一组示例集合来表示,并且和一组类别标签相关联。E-MIMLSVM~+算法是多示例多标签学习框架中利用退化思想的经典分类算法,针对其无法利用无标签样本进行学习从而造成泛化能力差等问题,使用半监督支持向量机对该算法进行改进。改进后的算法可以利用少量有标签样本和大量没有标签的样本进行学习,有助于发现样本集内部隐藏的结构信息,了解样本集的真实分布情况。通过对比实验可以看出,改进后的算法有效提高了分类器的泛化性能。
The multi-instance multi-label learning framework is a new machine learning framework for solving ambiguity problems.In the multi-instance multi-label learning framework, an object is represented by a set of examples and is associated with a set of category labels. The E-MIMLSVM + algorithm is a classical classification algorithm that uses degenerate ideas in the multi-instance multi-label learning framework. It can ′ t use unlabeled samples to learn and cause poor generalization ability. This paper uses se-mi-supervised support vector machine to implement the algorithm. The improved algorithm can use a small number of labeled sam-ples and a large number of unlabeled samples to learn, which helps to discover the hidden structure information inside the sample set and understand the true distribution of the sample set. It can be seen from the comparison experiment that the improved algo-rithm effectively improve the generalization performance of the classifier.
引文
[1]李斌,李丽娟.基于改进TSVM的未知网络应用识别算法[J].电子技术应用,2016,42(9):95-98.
[2]ZHOU Z H,ZHANG M L,HUANG S J,et al.Multi-instance multi-label learning[J].Artificial Intelligence,2012,176(1):2291-2320.
[3]张磊,殷梦婕,肖超恩,等.基于优化型支持向量机算法的硬件木马监测[J].电子技术应用,2018,44(11):17-20.
[4]张苗.基于多示例学习的图像检索算法研究[D].合肥:中国科学技术大学,2017.
[5]READ J,PFAHRINGER B,HOLMES G,et al.Classifier chains for multi-label classification[J].Machine Learning,2011,85(3):333.
[6]ZHOU Z H,ZHANG M L.Multi-instance multi-label learning with application to scene classification[A].Advances in Neural Information Processing Systems 19[C].MIT Press,2007:1609-1616.
[7]LI Y X,JI S W,KUMAR S,et al.Drosophila gene expression pattern annotation through multi-instance multi-label learning[J].IEEE/ACM Transactions on Computational Biology and Bionformatics,2012,9(1):98-112.
[8]ZHANG M L,ZHOU Z H.M3MIML:a maximum margin method for multi-instance multi-label learning[C].Eighth IEEE International Conference on Data Mining.IEEE,2008:688-697.
[9]周志华.机器学习[M].北京:清华大学出版社,2016.
[10]EVGENIOU T,PONTIL M.Regularized multi-task learning[A].Tenth ACM Sigkdd International Conference on Knowledge Discovery&Data Mining[C].ACM,2004:109-117.
[11]ZHANG J,GHAHRAMANI Z,YANG Y.Flexible latent variable models for multi-task learning[J].Machine Learning,2008,73(3):221-242.
[12]EVGENIOU T,MICCHELLI C A,PONTIL M.Learning multiple tasks with Kernel methods[J].Machine Learning Research,2005,6(4):615-637.
[13]LI Y F,KWOK J T,ZHOU Z H.Semi-supervised learning using label mean[A].International Conference on Machine Learning[C].ACM,2009:633-640.
[14]李宇峰.半监督支持向量机学习方法的研究[D].南京:南京大学,2013.
[15]BOUTELL M R,LUO J,BROWN C.M.Learning multilabel scene classification[J].Pattern Recognition,2004,37(9):1757-1771.
[16]MARON O,RATAN A L.Multiple-instance learning for natural scene classification[A].Proceedings of the 15th International Conference on Machine Learning[C].Morgan Kaufmann Publishers Inc,1998:341-349.
[17]SEBASTIANI F.Machine learning in automated text categorization[J].Computer Science,2015,34(1):1-47.
[18]ANDREWS S,TSOCHANTARIDIS I,HOFMANN T.Support vector machines for multiple-instance learning[A].Advances in Neural Information Processing Systems[C].ResearchGate,2003:561-568.