用户名: 密码: 验证码:
基于PLSA的新闻评论情绪类别自动标注方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Automatic Annotation of News Comments Emotion Based on PLSA
  • 作者:林江豪 ; 顾也力 ; 周咏梅 ; 阳爱民
  • 英文作者:LIN Jiang-Hao;GU Ye-Li;ZHOU Yong-Mei;YANG Ai-Min;Laboratory for Language Engineering and Computing,Guangdong University of Foreign Studies;School of Information Science and Technology,Guangdong University of Foreign Studies;Faculty of Asian Languages and Cultures,Guangdong University of Foreign Studies;
  • 关键词:语料库 ; 情绪类别 ; PLSA模型 ; 语料标注 ; 自动标注
  • 英文关键词:corpus;;emotional category;;PLSA model;;corpus annotation;;automatic annotation
  • 中文刊名:XTYY
  • 英文刊名:Computer Systems & Applications
  • 机构:广东外语外贸大学语言工程与计算实验室;广东外语外贸大学信息科学与技术学院;广东外语外贸大学东方语言文化学院;
  • 出版日期:2019-01-15
  • 出版单位:计算机系统应用
  • 年:2019
  • 期:v.28
  • 基金:教育部人文社会科学研究项目(14YJA740011);; 广州市哲学社会科学“十三五”规划2018年度课题(2018GZQN27);; 广东省科技计划项目(2017A040406025);; 国家自然科学基金(61877013)~~
  • 语种:中文;
  • 页:XTYY201901031
  • 页数:5
  • CN:01
  • ISSN:11-2854/TP
  • 分类号:209-213
摘要
针对大规模语料手动标注困难的问题,提出利用概率潜在语义分析(PLSA)模型的新闻评论自动标注方法.利用PLSA计算获得语料集的"文档-主题"和"词语-主题"概率矩阵;基于情感本体库和"词语-主题"概率矩阵,认为某一类情绪词汇出现的概率最高的主题与词汇的情绪类别相同,对主题进行情绪类别标注;最后,基于"文档-主题"概率矩阵,认为出现在某一主题概率最高的文档与主题的情绪类别相同,通过"词汇-主题-文档"三者的关系,达到自动标注的效果.实验结果表明,本文提出的方法准确率可达到90%以上.
        In order to solve the problem of manually annotating large-scale corpus,this study,based on the model of Probabilistic Latent Semantic Analysis(PLSA),proposed a method of automatic emotional annotation for news comments.First of all,the "doc-topic" and "word-topic" probability matrixes were computed by PLSA model.Then,drawing upon the "word-topic" together with the ontology lexicon,the emotional categories of the topics were annotated,with the presupposition that the emotional category of words is similar to those of words within the topic which occurs most frequently.Finally,the automatic annotation was made via the "doc-topic",with the assumption that the emotional category of topics is equivalent to those of topics within the document which occurs most frequently.The experimental results showed that the accurate rate of the method proposed by this study reached about 90%.
引文
1 Yang AM, Lin JH, Zhou YM, et al. Research on building a Chinese sentiment lexicon based on SO-PMI. Applied Mechanics and Materials, 2013, 263-266:1688-1693.[doi:10.4028/www.scientific.net/AMM.263-266.1688]
    2 Yang AM, Zhou YM, Lin JH. A method of Chinese texts sentiment classification based on Bayesian algorithm.Applied Mechanics and Materials, 2013, 263-266:2185-2190.[doi:10.4028/www.scientific.net/AMM.263-266.2185]
    3崔刚,盛永梅.语料库中语料的标注.清华大学学报(哲学社会科学版),2000, 15(1):89-94.[doi:10.13613/j.cnki.qhdz.000730]
    4宋鸿彦,刘军,姚天昉,等.汉语意见型主观性文本标注语料库的构建.中文信息学报,2009, 23(2):123-128.[doi:10.3969/j.issn. 1003-0077.2009.02.018]
    5阳爱民,周咏梅,周剑峰.中文微博语料情感类别自动标注方法.计算机应用,2014, 34(8):2188-2191.
    6周杰,林琛,李弼程.基于机器学习的网络新闻评论情感分类研究.计算机应用,2010, 30(4):1011-1014.
    7徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类.中文信息学报,2007,21(6):95-100.[doi:10.3969issn. 1003-0077.2007.06.013]
    8杨佳能,阳爱民,周咏梅.基于语义分析的中文微博情感分类方法.山东大学学报(理学版),2014, 49(11):14-21.[doi:10.6040/j.issn.1671-9352.3.2014.069]
    9潘云仙,袁方.基于JST模型的新闻文本的情感分类研究.郑州大学学报(理学版),2015,47(1):64-68.[doi:10.3969/j.issn.1671-6841.2015.01.014]
    10吴江,唐常杰,李太勇,等.基于语义规则的Web金融文本情感分析.计算机应用,2014, 34(2):481-485, 495.
    11 Khoo CSG, Nourbakhsh A, Na JC. Sentiment analysis of online news text:A case study of appraisal theory. Online Information Review, 2012, 36(6):858-878.[doi:10.1108/14684521211287936]
    12 Moreo A,Romero M, Castro JL, et al. Lexicon-based comments-oriented news sentiment analyzer system. Expert Systems with Applications, 2012, 39(10):9166-9180.[doi:10.1016/j.eswa.2012.02.057]
    13 Penalver-Martinez I, Garcia-Sanchez F, Valencia-Garcia R,et al. Feature-based opinion mining through ontologies.Expert Systems with Applications, 2014, 41(13):5995-6008.[doi:10.1016/j.eswa.2014.03.022]
    14徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造.情报学报,2008, 27(2):180-185.[doi:10.3969/j.issn.1000-0135.2008.02.004]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700