用户名: 密码: 验证码:
基于二维信息增益加权的朴素贝叶斯分类算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Naive Bayes Classification Algorithm of Feature Weighting Based on Two-Dimensional Information Gain
  • 作者:任世超 ; 黄子良
  • 英文作者:REN Shi-Chao;HUANG Zi-Liang;School of Communication Engineering, Chengdu University of Information Engineering;
  • 关键词:朴素贝叶斯 ; 文本分类 ; 特征加权 ; 二维信息增益 ; 加权算法
  • 英文关键词:naive Bayes;;text classification;;feature weighting;;two-dimensional information gain;;weighting algorithm
  • 中文刊名:XTYY
  • 英文刊名:Computer Systems & Applications
  • 机构:成都信息工程大学通信工程学院;
  • 出版日期:2019-06-15
  • 出版单位:计算机系统应用
  • 年:2019
  • 期:v.28
  • 语种:中文;
  • 页:XTYY201906020
  • 页数:6
  • CN:06
  • ISSN:11-2854/TP
  • 分类号:137-142
摘要
由于朴素贝叶斯算法的特征独立性假设以及传统TFIDF加权算法仅仅考虑了特征在整个训练集的分布情况,忽略了特征与类别和文档之间关系,造成传统方法赋予特征的权重并不能代表其准确性.针对以上问题,提出了二维信息增益加权的朴素贝叶斯分类算法,进一步考虑到了特征的二维信息增益即特征类别信息增益和特征文档信息增益对分类效果的影响,并设计实验与传统的加权朴素贝叶斯算法相比,该算法在查准率、召回率、F1值指标性能上能提升6%左右.
        Naive Bayes algorithm is based on feature-independence assumption and the traditional TF-IDF weighting algorithm, and only considers the distribution of features in the whole training set, but ignores the relationship between feature and categories or documents, so the weights given by traditional method cannot represent its performance. To solve the above problems, this study proposes a naive Bayes classification algorithm of feature weighting based on twodimensional information gain. It considers the effects of two-dimensional information gain of features, which are the information gain of category and the information gain of documents. Compared with the traditional naive Bayesian algorithm of feature weighting, the proposed algorithm can improve about 6% in the precision, recall, F1 value performance.
引文
1邸鹏,段利国.一种新型朴素贝叶斯文本分类算法.数据采集与处理,2014, 29(1):71-75.[doi:10.3969/j.issn.1004-9037.2014.01.010]
    2 Han JW, Kamber M.数据挖掘:概念与技术.范明,孟小峰,译.北京:机械工业出版社,2007.
    3李忠波,杨建华,刘文琦.基于数据填补和连续属性的朴素贝叶斯算法.计算机工程与应用,2016, 52(1):133-140.[doi:10.3778/j.issn. 1002-8331.1401-0232]
    4周志华.机器学习.北京:清华大学出版社,2016.
    5张玉芳,陈小莉,熊忠阳.基于信息增益的特征词权重调整算法研究.计算机工程与应用,2007, 43(35):159-161.[doi:10.3321/j.issn:1002-8331.2007.35.048]
    6李学明,李海瑞,薛亮,等.基于信息增益与信息熵的TFIDF算法.计算机工程,2012, 38(8):37-40.[doi:10.3778/j.issn.1002-8331.2012.08.011]
    7饶丽丽,刘雄辉,张东站.基于特征相关的改进加权朴素贝叶斯分类算法.厦门大学学报(自然科学版),2012, 51(4):682-685.
    8武建军,李昌兵.基于互信息的加权朴素贝叶斯文本分类算法.计算机系统应用,2017, 26(7):178-182.
    9贺鸣,孙建军,成颖.基于朴素贝叶斯的文本分类研究综述.情报科学,2016, 34(7):147-154.
    10 Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing&Management, 1988, 24(5):513-523.
    11李凯齐,刁兴春,曹建军.基于信息增益的文本特征权重改进算法.计算机工程,2011,37(1):16-18,21.[doi:10.3969/j.issn.1000-3428.2011.01.006]
    12 Jiang LX, Li CQ, Wang SS, et al. Deep feature weighting for naive Bayes and its application to text classification.Engineering Applications of Artificial Intelligence, 2016, 52:26-39.[doi:10.1016/j.engappai.2016.02.002]
    13 Zhang LG, Jiang LX, Li CQ, et al. Two feature weighting approaches for naive Bayes text classifiers. KnowledgeBased Systems, 2016, 100:137-144.[doi:10.1016/j.knosys.2016.02.017]
    14 Song Y, Kolcz A, Lee Giles C. Better Naive Bayes classification for high-precision spam detection.Software—Practice&Experience,2009, 39(11):1003-1024.
    15 He W, Zhang Y, Yu SJ, et al. Deep feature weighting with a novel information gain for naive Bayes text classification.JIHMSP,2019, 10(1).
    16 Wu J, Cai ZH, Zhu XQ. Self-adaptive probability estimation for naive Bayes classification. Proceedings of 2013International Joint Conference on Neural Networks. Dallas,TX, USA. 2013. 1-8.
    17 Li L, Li C. Research and improvement of a spam filter basedon naive Bayes. Proceedings of the 7th International Conference on Intelligent Human-Machine Systems and Cybernetics. Hangzhou, China. 2015. 361-364.
    18 Jiang QW, Wang W, Han X, et al. Deep feature weighting in Naive Bayes for Chinese text classification. Proceedings of the 2016 4th International Conference on Cloud Computing and Intelligence Systems. Beijing, China. 2016. 160-164.
    19 Arar OF, Ay an K. A feature dependent naive Bayes approach and its application to the software defect prediction problem.Applied Soft Computing, 2017, 59:197-209.[doi:10.1016/j.asoc.2017.05.043]
    20 Chen JN, Huang HK, Tian SF, et al. Feature selection for text classification with naive Bayes. Expert Systems with Applications, 2009, 36(3):5432-5435.[doi:10.1016/j.eswa.2008.06.054]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700