摘要
针对大规模语料手动标注困难的问题,提出利用概率潜在语义分析(PLSA)模型的新闻评论自动标注方法.利用PLSA计算获得语料集的"文档-主题"和"词语-主题"概率矩阵;基于情感本体库和"词语-主题"概率矩阵,认为某一类情绪词汇出现的概率最高的主题与词汇的情绪类别相同,对主题进行情绪类别标注;最后,基于"文档-主题"概率矩阵,认为出现在某一主题概率最高的文档与主题的情绪类别相同,通过"词汇-主题-文档"三者的关系,达到自动标注的效果.实验结果表明,本文提出的方法准确率可达到90%以上.
In order to solve the problem of manually annotating large-scale corpus,this study,based on the model of Probabilistic Latent Semantic Analysis(PLSA),proposed a method of automatic emotional annotation for news comments.First of all,the "doc-topic" and "word-topic" probability matrixes were computed by PLSA model.Then,drawing upon the "word-topic" together with the ontology lexicon,the emotional categories of the topics were annotated,with the presupposition that the emotional category of words is similar to those of words within the topic which occurs most frequently.Finally,the automatic annotation was made via the "doc-topic",with the assumption that the emotional category of topics is equivalent to those of topics within the document which occurs most frequently.The experimental results showed that the accurate rate of the method proposed by this study reached about 90%.
引文
1 Yang AM, Lin JH, Zhou YM, et al. Research on building a Chinese sentiment lexicon based on SO-PMI. Applied Mechanics and Materials, 2013, 263-266:1688-1693.[doi:10.4028/www.scientific.net/AMM.263-266.1688]
2 Yang AM, Zhou YM, Lin JH. A method of Chinese texts sentiment classification based on Bayesian algorithm.Applied Mechanics and Materials, 2013, 263-266:2185-2190.[doi:10.4028/www.scientific.net/AMM.263-266.2185]
3崔刚,盛永梅.语料库中语料的标注.清华大学学报(哲学社会科学版),2000, 15(1):89-94.[doi:10.13613/j.cnki.qhdz.000730]
4宋鸿彦,刘军,姚天昉,等.汉语意见型主观性文本标注语料库的构建.中文信息学报,2009, 23(2):123-128.[doi:10.3969/j.issn. 1003-0077.2009.02.018]
5阳爱民,周咏梅,周剑峰.中文微博语料情感类别自动标注方法.计算机应用,2014, 34(8):2188-2191.
6周杰,林琛,李弼程.基于机器学习的网络新闻评论情感分类研究.计算机应用,2010, 30(4):1011-1014.
7徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类.中文信息学报,2007,21(6):95-100.[doi:10.3969issn. 1003-0077.2007.06.013]
8杨佳能,阳爱民,周咏梅.基于语义分析的中文微博情感分类方法.山东大学学报(理学版),2014, 49(11):14-21.[doi:10.6040/j.issn.1671-9352.3.2014.069]
9潘云仙,袁方.基于JST模型的新闻文本的情感分类研究.郑州大学学报(理学版),2015,47(1):64-68.[doi:10.3969/j.issn.1671-6841.2015.01.014]
10吴江,唐常杰,李太勇,等.基于语义规则的Web金融文本情感分析.计算机应用,2014, 34(2):481-485, 495.
11 Khoo CSG, Nourbakhsh A, Na JC. Sentiment analysis of online news text:A case study of appraisal theory. Online Information Review, 2012, 36(6):858-878.[doi:10.1108/14684521211287936]
12 Moreo A,Romero M, Castro JL, et al. Lexicon-based comments-oriented news sentiment analyzer system. Expert Systems with Applications, 2012, 39(10):9166-9180.[doi:10.1016/j.eswa.2012.02.057]
13 Penalver-Martinez I, Garcia-Sanchez F, Valencia-Garcia R,et al. Feature-based opinion mining through ontologies.Expert Systems with Applications, 2014, 41(13):5995-6008.[doi:10.1016/j.eswa.2014.03.022]
14徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造.情报学报,2008, 27(2):180-185.[doi:10.3969/j.issn.1000-0135.2008.02.004]