用户名: 密码: 验证码:
演进式动态新闻文档摘要生成方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在现今社会,随着近十年计算机科学迅猛发展的浪潮,多文档摘要技术已经逐渐成长为一个令人兴奋并充满着挑战的研究前沿,往往通过自然语言处理和信息检索的联合技术来加以解决。面对着当今互联网上的信息快速增长,找寻信息的人们往往会发现自己很难能跟上信息更新的频率和速度。新闻信息如洪水一般在互联网上汹涌袭来,人们很容易被“淹没”在浩如烟海的信息中,不知道从何开始。因此,人们认为对新闻信息的自动理解已经成为Web信息处理的一个重要成分。
     对于一个演进式的新闻主题而言,人们往往有着多重兴趣,如:该事件是如何起始的,如何发展的,当前状况又是如何,但是传统的新闻理解技术并不足以应对用户的这种需求。普通的搜索引擎仅仅只能按照它们的理解对新闻数据按照查询相关度来进行排序,但是它们很难应对各种意图模糊的新闻主题类查询。再比如说,即使搜索引擎返回给用户的结果排序十分理想(虽然这不太可能),用户也不太会愿意将这些文档一一阅读。人们希望能够有一种简单浏览的方式来掌握整个事态的发展流程和演化轨迹。而新闻摘要则正是一个很好的解决方式,可以提供一个经过了压缩,具有极大信息含量的文档重组织和展现形式,可以让用户能轻松掌握事件的发展。我们提出了“时序年表”(Timeline)的概念,把一个演进式的新闻按照时间的维度,动态的摘要生成为一系列相互独立又互相依赖的子摘要,从而提供了一个展示事件发展全景概况的灵活方式。
     本文具体工作和创新性如下:
     1.我们提出了一个面向新闻文档的全新文本分段算法。相比于传统的多文档摘要任务而言,演进式新闻文档摘要面向的是更为庞大的海量新闻数据集。因此,我们在开始摘要生成工作之前,首先会进行一些针对新闻特征的预处理。由于一篇新闻文档并非是完全不可再分的:一篇新闻文档通常包含了不止一个事件,而每个事件可能代表着某个新闻主题的某个侧面,因此我们从新闻文档中,抽取出具有原子事件特征的新闻元片段。在同一篇新闻文档中的所有新闻元事件在一定程度上也是彼此互相独立的。所以,对于它们而言,并不是所有新闻元都和某个特定的新闻主题紧密关联。经过一个细粒度的事件元提取过程,我们可以去除一些事件无关的描述性语句或者过滤掉和当前新闻主题无关的新闻元事件,通过这种方式对海量数据进行一步压缩和预处理。这项工作的挑战也很明显,我们需要应对来自文法(如文本,命名实体,时间等)、句法(句子位移,连接词等)以及视觉要素上的一些约束来进行新闻元片段提取。
     2.我们引入了一种全新的摘要任务“演进式动态新闻文档摘要”,并提出了两种解决的算法框架,这些算法框架都可以推广到所有依赖式摘要生成问题中。给定一个新闻主题文档集合,系统会自动输出一个时间年表,而该时间年表下的一系列子摘要代表着该事件随着时间推移的发展轨迹。两个方法之一是基于全局优先图排序算法和局部优先图排序算法的优化结合框架,考虑到句子之间跨时间的依赖关系以及同时间下的相互依赖关系。其中,跨时间依赖关系是通过一个时序投影函数,将所有其他时间结点下的句子都投影到某个特定的时间平面上来加以建模的。第二种方式是一个基于约束条件下的迭代式句子替换框架,从一个句子集合中优选出最佳句子的子集合生成摘要:子摘要之间不是完全独立的,而是通过邻居子摘要来互相优化互相精炼生成,反映出新闻演进式的特征。对于每一个子摘要,我们都从两种角度去考量评价:一种是局部的,基于周围邻居时间结点;一种是全局的,基于全数据时间结点。
     3.我们首次提出了视觉化演进式动态新闻文档摘要的概念,并提出了针对视觉化摘要和基于迭代式互相增强算法框架的解决方法。给定某个新闻主题以及相关带有时间标签的文档集,系统会生成一个带视觉信息的演进式动态新闻文档摘要,其中分别包含文字部分以及图片部分,两个部分互为说明互为补充。每个子摘要代表着事件的发展过程,被全局信息的优化条件所约束。在这里,图片信息可以被用作提示句子摘要信息的线索,从而改变传统文本摘要的生成方式,这一点将是非常有利的。对于视觉化演进式动态新闻文档摘要的生成,我们使用了两个异质数据流,其中图片数据流在以往文档摘要的生成方法中是往往被忽略了的。此外,由于我们要同时使用两种异质数据流,我们需要通过翻译模型来建立两个语义维度的桥梁跨越语义隔阂。对于每个子摘要而言都包含有两个部分:文本部分和图片部分。对图片的选择会影响到对文本的选择,反之亦然。我们提出了一个有效的方式来保证这两部分能够很好的通过互相增强的方式匹配起来,并且通过全局-局部的约束,将各个子摘要的生成进行统一优化。
     4.我们提出了两种可能整合到演进式动态新闻文档摘要中的扩展特性。第一种是引入用户个性化。因为用户有着个人的喜好,所以可能对自己喜欢阅读什么样的内容具有某种倾向性,很明显的是对于所有用户都生成一个一模一样的摘要是不够的。我们提出了一个交互式的摘要生成方法,允许用户可以使用“点击”和“查看”的方式来和摘要生成系统进行交互。人机交互的方式支持用户点击句子,并且查看该内容的来源文档,提供了实时的伪相关反馈。这种隐式的“点击日志”能反映出人们的兴趣。由于用户的点击可能比较稀疏,我们使用了“点击平滑”的方式来扩大点击数据的影响。第二种可扩展的方向是引入大众热点信息,我们使用了Twitter网社交媒体的数据来捕获这样的辅助信息。Twitter系统并不是只有一系列的帖子组成:在帖子的背后是一个潜在的用户关系网络图,包括用户之间的“粉丝”关系,和帖子之间的“转发”关系。对于大众热点信息,应该是热门的并且尽可能避免重复。我们通过一个对用户以及帖子两种异质结点进行共同排序的框架,整合信息热度和信息差异性,基于随机游走的排序框架甄选出大众热点信息。
Nowadays, Multi-Document Summarization has long been an exciting and chal-lenging field of Natural Language Processing (NLP) and Information Retrieval (IR)joint research in modern computer science for tens of years. Faced with the rapidinformation explosion on the World Wide Web, information seekers can hardly keeppace with the overloaded new updates. News floods spread throughout the Internet andhence readers get drown in the”sea” of overwhelming information, wondering whereto access. As a result, news digestion becomes increasingly essential in Web contentsanalysis.
     For an evolutionary news topic, people may have the myriad of general interestsabout the beginning, the evolution or the most up to date situation. However, traditionaltechniques are to some extent insufficient. General search engines simply rank newsdocuments according to their understanding of query relevance, but they are not quitecapable of handling ambiguous intentioned queries. In many cases, even if the rankeddocuments could be in a satisfying order, readers are tired of navigating all data in themassive collection: they would like to monitor the evolution trajectory of hot news bysimply brief browsing. Summarization is an ideal way out for such dilemma, providingcondensed, informative document reorganization for faster and better representationof news evolution. Our proposed Timeline temporally summarizes evolving news as aseries of individual but correlated component summaries along the temporal dimension,and hence offers an option to understand the big picture of a developing situation.
     To summarize, the contribution of this paper includes:
     1. We propose a novel text segmentation method especially for news understanding.As we are facing with much larger corpus compared with traditional summariza-tion tasks, before we get started to summarization, we propose to conduct somenews pre-processing. We extract text snippets representing atomic “events” fromnews documents. As news articles are not indivisible, they always contain morethan one event, where each event denotes an aspect of a news topic. Events withinthe same news document are sometimes independent from each other. Therefore not all of them are equally relevant to the particular news topic. After the fine-grained event distilling procedure, we compress the corpora by discarding non-event descriptions and filtering those snippets non-relevant to any of the topicwords. The challenge for snippet extraction is apparent due to the complicatednatural language discourse structure and the use of rich event-oriented features,such as semantic (similarity, named entities, temporal distance), syntactic (con-junctions, sentence offsets), and layout elements to segment boundaries.
     2. We introduce a novel framework for the web mining service Evolutionary Time-line Summarization (ETS). Taking a news collection as input, the system auto-matically outputs a timeline with items of component summaries which repre-sent evolutionary trajectories on specific dates. We propose two ways to solvethe problem: one is based on a optimized combination of global biased rank-ing framework and local biased ranking framework with inter-date dependenciesand intra-date dependencies respectively. Particularly, the inter-date dependencycalculation includes temporal decays to project sentences from all dates onto atime horizon. The second one is proposed by a balanced optimization frameworkthrough iterative substitution from a set of sentences to a subset of sentencesunder constraints: the component summaries are not assumed to be completelyisolated because neighboring summaries are generated inter-dependently due tonews characteristics over time. We have double criteria to evaluate the qualitiesof component summaries: both locally, i.e., based on temporally adjacent neigh-bors, and globally, i.e., based on the whole collection.
     3. We propose an iterative reinforcement approach for the summarization problemof Visual Timeline Summarization (VTS) and for the first time we introduce theconcept of visual timelines. Given the massive collection of time-stamped webdocuments related to a general news subject, the system automatically outputsa visual timeline in items of component summaries with texts and images asmutual descriptions. Component summaries, iteratively refined by global infor-mation, represent evolutionary trajectories across dates. Images, as the hints tosummarize sentences, will alter the traditional way of textual summarization, andhence is beneficial. For the VTS problem, we utilize two heterogeneous streamsof contents, where images have long been overlooked in summarization works.Besides, as we have heterogeneous sources of texts and images, it is challengingto bridge over the semantic gap across the two modalities between each other.As component summaries have two parts, texts and images, the choice of images will have influence on the text selection and vice versa. we propose an effectiveapproach to ensure that both counterparts of texts and images within the gener-ated timeline are appropriately matched by using the mutual reinforcement, andformulate the problem into a global-to-local scenario, i.e., to use global timelinesummary to refine local component summaries iteratively.
     4. We provide two possible extensive characteristics to incorporate into evolutionarytimeline summarization. The first one is to combine general timeline with userpersonalization. Since users may have potential bias on what they prefer to readdue to their individual interests and obviously a universal summary for all usersis not satisfactory, we introduce a mechanism of Interactive Personalized Sum-marization (IPS), by using”click” and”examine” between readers and contents.The human-system interaction supports clicking into the sentences and examin-ing source contexts for the real-time pseudo feedbacks. The implicit clickthroughdata of user clicks indicates what they are interested in. User click data is oftensparse but we amplify these tiny hints of user interest by “click smoothing”. Thesecond possible direction is incorporate mass media focus from the online socialnetwork services Twitter, to capture the general interests of the society as auxil-iary information. The Twitter system is not simply made up of a set of tweets:there are latent networks including the following relationships among users andthe retweeting linkage. The information of the mass focus should be popularand avoid redundancy. We utilize a unified co-ranking framework, i.e., rankingvertices of tweets and twitter users based on the heterogenous graph, and fusespopularity and diversity simultaneously in the random walk paradigm.
引文
[1] J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics. In Proceedingsof the24th annual international ACM SIGIR conference on Research and development ininformation retrieval, SIGIR’01, pages10–18, New York, NY, USA,2001. ACM.
    [2] D. E. Appelt. Introduction to information extraction. Ai Communications,12(3):161–172,1999.
    [3] L. D. Baker and A. K. McCallum. Distributional clustering of words for text classification.In Proceedings of the21st annual international ACM SIGIR conference on Research anddevelopment in information retrieval, SIGIR’98, pages96–103, New York, NY, USA,1998.ACM.
    [4] S. Banerjee and A. I. Rudnicky. A texttiling based approach to topic boundary detection inmeetings.2006.
    [5] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matchingwords and pictures. J. Mach. Learn. Res.,3:1107–1135, Mar.2003.
    [6] R. Barzilay, M. Elhadad, et al. Using lexical chains for text summarization. In Proceedingsof the ACL workshop on intelligent scalable text summarization, volume17, pages10–17,1997.
    [7] R. Barzilay, N. Elhadad, and K. R. McKeown. Sentence ordering in multidocument summa-rization. In Proceedings of the first international conference on Human language technologyresearch, HLT’01, pages1–7, Stroudsburg, PA, USA,2001. Association for ComputationalLinguistics.
    [8] R. Barzilay, K. R. McKeown, and M. Elhadad. Information fusion in the context of multi-document summarization. In Proceedings of the37th annual meeting of the Association forComputational Linguistics on Computational Linguistics, ACL’99, pages550–557, Strouds-burg, PA, USA,1999. Association for Computational Linguistics.
    [9] Y. Bestgen. Improving text segmentation using latent semantic analysis: A reanalysis ofchoi, wiemer-hastings, and moore (2001). Comput. Linguist.,32(1):5–12, Mar.2006.
    [10] Y. Bestgen and W. Vonk. The role of temporal segmentation markers in discourse processing.Discourse Processes,19(3):385–406,1995.
    [11] Y. Bestgen and W. Vonk. Temporal adverbials as segmentation markers in discourse com-prehension. Journal of Memory and Language,42(1):74–87,2000.
    [12] D. M. Blei and M. I. Jordan. Modeling annotated data. In Proceedings of the26th annualinternational ACM SIGIR conference on Research and development in informaion retrieval,SIGIR’03, pages127–134, New York, NY, USA,2003. ACM.
    [13] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machineLearning research,3:993–1022,2003.
    [14] T. Brants, F. Chen, and A. Farahat. A system for new event detection. In Proceedingsof the26th annual international ACM SIGIR conference on Research and development ininformaion retrieval, SIGIR’03, pages330–337, New York, NY, USA,2003. ACM.
    [15] J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reorderingdocuments and producing summaries. In Proceedings of the21st annual international ACMSIGIR conference on Research and development in information retrieval, SIGIR’98, pages335–336, New York, NY, USA,1998. ACM.
    [16] H. L. Chieu and Y. K. Lee. Query based event extraction along a timeline. In Proceedingsof the27th annual international ACM SIGIR conference on Research and development ininformation retrieval, SIGIR’04, pages425–432, New York, NY, USA,2004. ACM.
    [17] E. Cutrell and Z. Guan. What are you looking for?: an eye-tracking study of informationusage in web search. In Proceedings of the SIGCHI Conference on Human Factors in Com-puting Systems, CHI’07, pages407–416, New York, NY, USA,2007. ACM.
    [18] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete datavia the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological),pages1–38,1977.
    [19] C. Ding, X. He, P. Husbands, H. Zha, and H. D. Simon. Pagerank, hits and a unified frame-work for link analysis. In Proceedings of the25th annual international ACM SIGIR con-ference on Research and development in information retrieval, SIGIR’02, pages353–354,New York, NY, USA,2002. ACM.
    [20] P. Du, J. Guo, and X.-Q. Cheng. Decayed divrank: capturing relevance, diversity and prestigein information networks. In Proceedings of the34th international ACM SIGIR conferenceon Research and development in Information Retrieval, SIGIR’11, pages1239–1240, NewYork, NY, USA,2011. ACM.
    [21] P. Duygulu, K. Barnard, J. F. de Freitas, and D. A. Forsyth. Object recognition as machinetranslation: Learning a lexicon for a fixed image vocabulary. In Proceedings of7th EuropeanConference on Computer Vision, pages349–354. Springer,2002.
    [22] G. Erkan and D. R. Radev. Lexpagerank: Prestige in multi-document text summarization. InProceedings of EMNLP, volume4,2004.
    [23] D. K. Evans, J. L. Klavans, and K. R. McKeown. Columbia newsblaster: multilingual newssummarization on the web. In Demonstration Papers at HLT-NAACL2004, HLT-NAACL–Demonstrations’04, pages1–4, Stroudsburg, PA, USA,2004. Association for ComputationalLinguistics.
    [24] A. Feng and J. Allan. Finding and linking incidents in news. In Proceedings of the sixteenthACM conference on Conference on information and knowledge management, CIKM’07,pages821–830, New York, NY, USA,2007. ACM.
    [25] A. Feng and J. Allan. Incident threading for news passages. In Proceedings of the18th ACMconference on Information and knowledge management, CIKM’09, pages1307–1316, NewYork, NY, USA,2009. ACM.
    [26] Y. Feng and M. Lapata. Automatic image annotation using auxiliary text information. Pro-ceedings of ACL-08: HLT, pages272–280,2008.
    [27] Y. Feng and M. Lapata. How many words is a picture worth? automatic caption generationfor news images. In Proceedings of the48th Annual Meeting of the Association for Compu-tational Linguistics, ACL’10, pages1239–1249, Stroudsburg, PA, USA,2010. Associationfor Computational Linguistics.
    [28] Y. Feng and M. Lapata. Topic models for image annotation and text illustration. In HumanLanguage Technologies: The2010Annual Conference of the North American Chapter of theAssociation for Computational Linguistics, HLT’10, pages831–839, Stroudsburg, PA, USA,2010. Association for Computational Linguistics.
    [29] F. Fukumoto and Y. Suzuki. Detecting shifts in news stories for paragraph extraction. InProceedings of the19th international conference on Computational linguistics-Volume1,COLING’02, pages1–7, Stroudsburg, PA, USA,2002. Association for Computational Lin-guistics.
    [30] F. Fukumoto, Y. Suzukit, and J. Fukumoto. An automatic extraction of key paragraphs basedon context dependency. In Proceedings of the fifth conference on Applied natural languageprocessing, ANLC’97, pages291–298, Stroudsburg, PA, USA,1997. Association for Com-putational Linguistics.
    [31] G. P. C. Fung, J. X. Yu, H. Liu, and P. S. Yu. Time-dependent event hierarchy construction.In Proceedings of the13th ACM SIGKDD international conference on Knowledge discoveryand data mining, KDD’07, pages300–309, New York, NY, USA,2007. ACM.
    [32] J. Gillenwater, A. Kulesza, and B. Taskar. Discovering diverse and salient threads in docu-ment collections. In Proceedings of the2012Conference on Empirical Methods in MachineLearning,2012.
    [33] J. Goldstein, V. Mittal, J. Carbonell, and J. Callan. Creating and evaluating multi-documentsentence extract summaries. In Proceedings of the ninth international conference on Infor-mation and knowledge management, CIKM’00, pages165–172, New York, NY, USA,2000.ACM.
    [34] Y. Gong and X. Liu. Generic text summarization using relevance measure and latent se-mantic analysis. In Proceedings of the24th annual international ACM SIGIR conference onResearch and development in information retrieval, SIGIR’01, pages19–25, New York, NY,USA,2001. ACM.
    [35] L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in wwwsearch. In Proceedings of the27th annual international ACM SIGIR conference on Researchand development in information retrieval, SIGIR’04, pages478–479, New York, NY, USA,2004. ACM.
    [36] J. E. Grimes. The thread of discourse, volume207. Walter de Gruyter,1975.
    [37] Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goalsfrom interaction data. In Proceedings of the33rd international ACM SIGIR conference onResearch and development in information retrieval, SIGIR’10, pages130–137, New York,NY, USA,2010. ACM.
    [38] Q. Guo and E. Agichtein. Towards predicting web searcher gaze position from mouse move-ments. In CHI’10Extended Abstracts on Human Factors in Computing Systems, CHI EA’10, pages3601–3606, New York, NY, USA,2010. ACM.
    [39] S. Harabagiu, F. Lacatusu, and A. Hickl. Answering complex questions with random walkmodels. In Proceedings of the29th annual international ACM SIGIR conference on Researchand development in information retrieval, SIGIR’06, pages220–227, New York, NY, USA,2006. ACM.
    [40] M. A. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the32nd annual meeting on Association for Computational Linguistics, ACL’94, pages9–16,Stroudsburg, PA, USA,1994. Association for Computational Linguistics.
    [41] M. A. Hearst. Texttiling: segmenting text into multi-paragraph subtopic passages. Comput.Linguist.,23(1):33–64, Mar.1997.
    [42] M. A. Hearst and C. Plaunt. Subtopic structuring for full-length document access. In Pro-ceedings of the16th annual international ACM SIGIR conference on Research and develop-ment in information retrieval, SIGIR’93, pages59–68, New York, NY, USA,1993. ACM.
    [43] L. Hirschman and R. Gaizauskas. Natural language question answering: The view from here.Natural Language Engineering,7(4):275–300,2001.
    [44] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the tenthACM SIGKDD international conference on Knowledge discovery and data mining, KDD’04, pages168–177, New York, NY, USA,2004. ACM.
    [45] P. Hu, M. Huang, P. Xu, W. Li, A. K. Usadi, and X. Zhu. Generating breakpoint-basedtimeline overview for news topic retrospection. In IEEE11th International Conference onData Mining (ICDM)2011, pages260–269. IEEE,2011.
    [46] J. Jiang and C. Zhai. Extraction of coherent relevant passages using hidden markov models.ACM Trans. Inf. Syst.,24(3):295–319, July2006.
    [47] Y. Jiang, C.-S. Perng, and T. Li. Natural event summarization. In Proceedings of the20th ACM international conference on Information and knowledge management, CIKM’11,pages765–774, New York, NY, USA,2011. ACM.
    [48] X. Jin, S. Spangler, R. Ma, and J. Han. Topic initiator detection on the world wide web.In Proceedings of the19th international conference on World wide web, WWW’10, pages481–490, New York, NY, USA,2010. ACM.
    [49] H. Jing, R. Barzilay, K. McKeown, M. Elhadad, et al. Summarization evaluation methods:Experiments and analysis. In AAAI Symposium on Intelligent Summarization, pages51–59,1998.
    [50] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting click-through data as implicit feedback. In Proceedings of the28th annual international ACMSIGIR conference on Research and development in information retrieval, SIGIR’05, pages154–161, New York, NY, USA,2005. ACM.
    [51] K. S. Jones and J. R. Galliers. Evaluating natural language processing systems: An analysisand review, volume1083. Springer,1996.
    [52] R. Kessler, X. Tannier, C. Hage`ge, V. Moriceau, and A. Bittar. Finding salient dates forbuilding thematic timelines. In ACL (1), pages730–739,2012.
    [53] H. D. Kim and C. Zhai. Generating comparative summaries of contradictory opinions in text.In Proceedings of the18th ACM conference on Information and knowledge management,CIKM’09, pages385–394, New York, NY, USA,2009. ACM.
    [54] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM(JACM),46(5):604–632,1999.
    [55] K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic ap-proach to sentence compression. Artificial Intelligence,139(1):91–107,2002.
    [56] A. Kolcz, V. Prabakarmurthi, and J. Kalita. Summarization as feature selection for textcategorization. In Proceedings of the tenth international conference on Information andknowledge management, CIKM’01, pages365–370, New York, NY, USA,2001. ACM.
    [57] L.-W. Ku, L.-Y. Lee, T.-H. Wu, and H.-H. Chen. Major topic detection and its applicationto opinion summarization. In Proceedings of the28th annual international ACM SIGIRconference on Research and development in information retrieval, SIGIR’05, pages627–628, New York, NY, USA,2005. ACM.
    [58] S. Kullback. The kullback-leibler distance. The American Statistician,41(4):340–341,1987.
    [59] G. Kumaran and J. Allan. Text classification and named entities for new event detection. InProceedings of the27th annual international ACM SIGIR conference on Research and de-velopment in information retrieval, SIGIR’04, pages297–304, New York, NY, USA,2004.ACM.
    [60] O. Kurland and L. Lee. Respect my authority!: Hits without hyperlinks, utilizing cluster-based language models. In Proceedings of the29th annual international ACM SIGIR confer-ence on Research and development in information retrieval, SIGIR’06, pages83–90, NewYork, NY, USA,2006. ACM.
    [61] A. N. Langville and C. D. Meyer. Deeper inside pagerank. Internet Mathematics,1(3):335–380,2004.
    [62] A. Leuski, C.-Y. Lin, and E. Hovy. ineats: interactive multi-document summarization. In Pro-ceedings of the41st Annual Meeting on Association for Computational Linguistics-Volume2, ACL’03, pages125–128, Stroudsburg, PA, USA,2003. Association for ComputationalLinguistics.
    [63] L. Li, Y. Shang, and W. Zhang. Improvement of hits-based algorithms on web documents.In Proceedings of the11th international conference on World Wide Web, WWW’02, pages527–535, New York, NY, USA,2002. ACM.
    [64] L. Li, D. Wang, C. Shen, and T. Li. Ontology-enriched multi-document summarization indisaster management. In Proceedings of the33rd international ACM SIGIR conference onResearch and development in information retrieval, SIGIR’10, pages819–820, New York,NY, USA,2010. ACM.
    [65] L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Enhancing diversity, coverage and balance forsummarization through structure learning. In Proceedings of the18th international confer-ence on World wide web, WWW’09, pages71–80, New York, NY, USA,2009. ACM.
    [66] C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li. Generating event storylines from mi-croblogs. In CIKM, pages175–184,2012.
    [67] C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text SummarizationBranches Out: Proceedings of the ACL-04Workshop, pages74–81,2004.
    [68] C.-Y. Lin and E. Hovy. Automated multi-document summarization in neats. In Proceedingsof the second international conference on Human Language Technology Research, HLT’02,pages59–62, San Francisco, CA, USA,2002. Morgan Kaufmann Publishers Inc.
    [69] C.-Y. Lin and E. Hovy. From single to multi-document summarization: a prototype systemand its evaluation. In Proceedings of the40th Annual Meeting on Association for Computa-tional Linguistics, ACL’02, pages457–464, Stroudsburg, PA, USA,2002. Association forComputational Linguistics.
    [70] C.-Y. Lin and E. Hovy. From single to multi-document summarization: a prototype systemand its evaluation. In Proceedings of the40th Annual Meeting on Association for Computa-tional Linguistics, ACL’02, pages457–464, Stroudsburg, PA, USA,2002. Association forComputational Linguistics.
    [71] C.-Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrencestatistics. In Proceedings of the2003Conference of the North American Chapter of the Asso-ciation for Computational Linguistics on Human Language Technology-Volume1, NAACL’03, pages71–78, Stroudsburg, PA, USA,2003. Association for Computational Linguistics.
    [72] D. G. Lowe. Object recognition from local scale-invariant features. In The proceedings ofthe seventh IEEE International Conference on Computer Vision, volume2, pages1150–1157.Ieee,1999.
    [73] Y. Lu, J. He, D. Shan, and H. Yan. Recommending citations with translation model. InProceedings of the20th ACM international conference on Information and knowledge man-agement, CIKM’11, pages2017–2020, New York, NY, USA,2011. ACM.
    [74] Y. Lu, H. Wang, C. Zhai, and D. Roth. Unsupervised discovery of opposing opinion net-works from forum discussions. In Proceedings of the21st ACM international conferenceon Information and knowledge management, CIKM’12, pages1642–1646, New York, NY,USA,2012. ACM.
    [75] Y. Lv and C. Zhai. Positional language models for information retrieval. In Proceedings ofthe32nd international ACM SIGIR conference on Research and development in informationretrieval, SIGIR’09, pages299–306, New York, NY, USA,2009. ACM.
    [76] J. MacQueen et al. Some methods for classification and analysis of multivariate observations.In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability,volume1, page14. California, USA,1967.
    [77] I. Mani and M. T. Maybury. Advances in automatic text summarization. MIT press,1999.
    [78] C. D. Manning, P. Raghavan, and H. Schu¨tze. Introduction to information retrieval, volume1. Cambridge University Press Cambridge,2008.
    [79] K. Mckeown, R. Barzilay, J. Chen, D. Elson, D. Evans, J. Klavans, A. Nenkova, B. Schiffman,and S. Sigelman. Columbia’s newsblaster: New features and future directions (demo. InIn Proceedings of NAACL-HLT’03. Citeseer,2003.
    [80] Q. Mei, J. Guo, and D. Radev. Divrank: the interplay of prestige and diversity in informationnetworks. In Proceedings of the16th ACM SIGKDD international conference on Knowledgediscovery and data mining, KDD’10, pages1009–1018, New York, NY, USA,2010. ACM.
    [81] R. Mihalcea. Graph-based ranking algorithms for sentence extraction, applied to text summa-rization. In Proceedings of the ACL2004on Interactive poster and demonstration sessions,ACLdemo’04, Stroudsburg, PA, USA,2004. Association for Computational Linguistics.
    [82] R. Mihalcea. Language independent extractive summarization. In Proceedings of theACL2005on Interactive poster and demonstration sessions, ACLdemo’05, pages49–52,Stroudsburg, PA, USA,2005. Association for Computational Linguistics.
    [83] R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple docu-ment summarization. In Proceedings of IJCNLP, volume5,2005.
    [84] H. Misra, F. Yvon, J. M. Jose, and O. Cappe. Text segmentation via topic modeling: ananalytical study. In Proceedings of the18th ACM conference on Information and knowledgemanagement, CIKM’09, pages1553–1556, New York, NY, USA,2009. ACM.
    [85] R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In Proceed-ings of the thirteenth ACM international conference on Information and knowledge manage-ment, CIKM’04, pages446–453, New York, NY, USA,2004. ACM.
    [86] A. Nenkova and R. Passonneau. Evaluating content selection in summarization: The pyramidmethod. In Proceedings of HLT-NAACL, volume2004,2004.
    [87] A. Nenkova, R. Passonneau, and K. McKeown. The pyramid method: Incorporating humancontent selection variation in summarization evaluation. ACM Trans. Speech Lang. Process.,4(2), May2007.
    [88] J. Otterbacher, G. Erkan, and D. R. Radev. Using random walks for question-focused sen-tence retrieval. In Proceedings of the conference on Human Language Technology and Em-pirical Methods in Natural Language Processing, HLT’05, pages915–922, Stroudsburg,PA, USA,2005. Association for Computational Linguistics.
    [89] J. Otterbacher, G. Erkan, and D. R. Radev. Biased lexrank: Passage retrieval using randomwalks with question-based priors. Information Processing&Management,45(1):42–54,2009.
    [90] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringingorder to the web.1999.
    [91] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation ofmachine translation. In Proceedings of the40th Annual Meeting on Association for Compu-tational Linguistics, ACL’02, pages311–318, Stroudsburg, PA, USA,2002. Association forComputational Linguistics.
    [92] M. J. Paul, C. Zhai, and R. Girju. Summarizing contrastive viewpoints in opinionated text. InProceedings of the2010Conference on Empirical Methods in Natural Language Processing,EMNLP’10, pages66–76, Stroudsburg, PA, USA,2010. Association for ComputationalLinguistics.
    [93] J. Ponte and W. Croft. Text segmentation by topic. Research and Advanced Technology forDigital Libraries, pages113–125,1997.
    [94] J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. InProceedings of the21st annual international ACM SIGIR conference on Research and de-velopment in information retrieval, SIGIR’98, pages275–281, New York, NY, USA,1998.ACM.
    [95] F. Qiu and J. Cho. Automatic identification of user interest for personalized search. InProceedings of the15th international conference on World Wide Web, WWW’06, pages727–736, New York, NY, USA,2006. ACM.
    [96] D. Radev, J. Otterbacher, A. Winkel, and S. Blair-Goldensohn. Newsinessence: summarizingonline news topics. Commun. ACM,48(10):95–98, Oct.2005.
    [97] D. Radev, A. Winkel, and M. Topper. Multi document centroid-based text summarization.ACL Demo Session, pages112–113,2002.
    [98] D. R. Radev, H. Jing, M. Stys′, and D. Tam. Centroid-based summarization of multipledocuments. Information Processing&Management,40(6):919–938,2004.
    [99] K. Rodden and X. Fu. Exploring how mouse movements relate to eye movements on websearch results pages. In SIGIR2007Workshop on Web Information Seeking and Interaction(WISI), pages29–32,2007.
    [100] H. Saggion, S. Teufel, D. Radev, and W. Lam. Meta-evaluation of summaries in a cross-lingual environment using content-based metrics. In Proceedings of the19th internationalconference on Computational linguistics-Volume1, COLING’02, pages1–7, Stroudsburg,PA, USA,2002. Association for Computational Linguistics.
    [101] G. Salton, A. Singhal, C. Buckley, and M. Mitra. Automatic text decomposition using textsegments and text themes. In Proceedings of the the seventh ACM conference on Hypertext,HYPERTEXT’96, pages53–65, New York, NY, USA,1996. ACM.
    [102] G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summariza-tion. Information Processing&Management,33(2):193–207,1997.
    [103] B. Schiffman, A. Nenkova, and K. McKeown. Experiments in multidocument summariza-tion. In Proceedings of the second international conference on Human Language TechnologyResearch, HLT’02, pages52–58, San Francisco, CA, USA,2002. Morgan Kaufmann Pub-lishers Inc.
    [104] D. Shahaf, C. Guestrin, and E. Horvitz. Metro maps of science. In Proceedings of the18thACM SIGKDD international conference on Knowledge discovery and data mining, KDD’12, pages1122–1130, New York, NY, USA,2012. ACM.
    [105] D. Shahaf, C. Guestrin, and E. Horvitz. Trains of thought: generating information maps.In Proceedings of the21st international conference on World Wide Web, WWW’12, pages899–908, New York, NY, USA,2012. ACM.
    [106] D. Shen, Z. Chen, Q. Yang, H.-J. Zeng, B. Zhang, Y. Lu, and W.-Y. Ma. Web-page clas-sification through summarization. In Proceedings of the27th annual international ACMSIGIR conference on Research and development in information retrieval, SIGIR’04, pages242–249, New York, NY, USA,2004. ACM.
    [107] X. Shen, B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback.In Proceedings of the28th annual international ACM SIGIR conference on Research anddevelopment in information retrieval, SIGIR’05, pages43–50, New York, NY, USA,2005.ACM.
    [108] X. Shen, B. Tan, and C. Zhai. Implicit user modeling for personalized search. In Proceed-ings of the14th ACM international conference on Information and knowledge management,CIKM’05, pages824–831, New York, NY, USA,2005. ACM.
    [109] Z. Shi, G. Melli, Y. Wang, Y. Liu, B. Gu, M. Kashani, A. Sarkar, and F. Popowich. Questionanswering summarization of multiple biomedical documents. Advances in Artificial Intelli-gence, pages284–295,2007.
    [110] L. Shrestha and K. McKeown. Detection of question-answer pairs in email conversations.In Proceedings of the20th international conference on Computational Linguistics, COLING’04, Stroudsburg, PA, USA,2004. Association for Computational Linguistics.
    [111] R. Sipos, A. Swaminathan, P. Shivaswamy, and T. Joachims. Temporal corpus summarizationusing submodular word coverage. In Proceedings of the21st ACM international conferenceon Information and knowledge management, CIKM’12, pages754–763, New York, NY,USA,2012. ACM.
    [112] W. M. Soon, H. T. Ng, and D. C. Y. Lim. A machine learning approach to coreferenceresolution of noun phrases. Computational linguistics,27(4):521–544,2001.
    [113] R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the23rd annual international ACM SIGIR conference on Research and development in informa-tion retrieval, SIGIR’00, pages49–56, New York, NY, USA,2000. ACM.
    [114] L. Tang, T. Li, and C.-S. Perng. Logsig: generating system events from raw textual logs.In Proceedings of the20th ACM international conference on Information and knowledgemanagement, CIKM’11, pages785–794, New York, NY, USA,2011. ACM.
    [115] J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing search via automated analysis ofinterests and activities. In Proceedings of the28th annual international ACM SIGIR con-ference on Research and development in information retrieval, SIGIR’05, pages449–456,New York, NY, USA,2005. ACM.
    [116] J. Teevan, D. Ramage, and M. R. Morris.#twittersearch: a comparison of microblog searchand web search. In Proceedings of the fourth ACM international conference on Web searchand data mining, WSDM’11, pages35–44, New York, NY, USA,2011. ACM.
    [117] M. Tomasoni and M. Huang. Metadata-aware measures for answer summarization in com-munity question answering. In Proceedings of the48th Annual Meeting of the Associationfor Computational Linguistics, ACL’10, pages760–769, Stroudsburg, PA, USA,2010. As-sociation for Computational Linguistics.
    [118] S. Tong and D. Koller. Support vector machine active learning with applications to textclassification. J. Mach. Learn. Res.,2:45–66, Mar.2002.
    [119] P. van Mulbregt, I. Carp, L. Gillick, S. Lowe, and J. Yamron. Text segmentation and topictracking on broadcast news via a hidden markov model approach. In Proceedings of the5th International Conference on Spoken Language Processing, pages2519–2522. Citeseer,1998.
    [120] S. Wan and C. Paris. In-browser summarisation: generating elaborative summaries biasedtowards the reading context. In Proceedings of the46th Annual Meeting of the Association forComputational Linguistics on Human Language Technologies: Short Papers, HLT-Short’08,pages129–132, Stroudsburg, PA, USA,2008. Association for Computational Linguistics.
    [121] X. Wan. Timedtextrank: adding the temporal dimension to multi-document summarization.In Proceedings of the30th annual international ACM SIGIR conference on Research and de-velopment in information retrieval, SIGIR’07, pages867–868, New York, NY, USA,2007.ACM.
    [122] X. Wan. An exploration of document impact on graph-based multi-document summarization.In Proceedings of the Conference on Empirical Methods in Natural Language Processing,EMNLP’08, pages755–762, Stroudsburg, PA, USA,2008. Association for ComputationalLinguistics.
    [123] X. Wan. Topic analysis for topic-focused multi-document summarization. In Proceedingsof the18th ACM conference on Information and knowledge management, CIKM’09, pages1609–1612, New York, NY, USA,2009. ACM.
    [124] X. Wan. Update summarization based on co-ranking with constraints. In COLING (Posters),pages1291–1300,2012.
    [125] X. Wan, H. Jia, S. Huang, and J. Xiao. Summarizing the differences in multilingual news. InProceedings of the34th international ACM SIGIR conference on Research and developmentin Information Retrieval, SIGIR’11, pages735–744, New York, NY, USA,2011. ACM.
    [126] X. Wan and J. Xiao. Single document keyphrase extraction using neighborhood knowledge.In Proceedings of AAAI, pages855–860,2008.
    [127] X. Wan and J. Xiao. Graph-based multi-modality learning for topic-focused multi-documentsummarization. In Proceedings of the21st international jont conference on Artifical intelli-gence, pages1586–1591. Morgan Kaufmann Publishers Inc.,2009.
    [128] X. Wan and J. Yang. Improved affinity graph based multi-document summarization. InProceedings of the Human Language Technology Conference of the NAACL, CompanionVolume: Short Papers, NAACL-Short’06, pages181–184, Stroudsburg, PA, USA,2006.Association for Computational Linguistics.
    [129] X. Wan and J. Yang. Multi-document summarization using cluster-based link analysis. InProceedings of the31st annual international ACM SIGIR conference on Research and de-velopment in information retrieval, SIGIR’08, pages299–306, New York, NY, USA,2008.ACM.
    [130] X. Wan, J. Yang, and J. Xiao. Single document summarization with document expansion. InProceedings of the national conference on artificial intelligence.
    [131] X. Wan, J. Yang, and J. Xiao. Manifold-ranking based topic-focused multi-document summa-rization. In Proceedings of the20th international joint conference on Artifical intelligence,pages2903–2908. Morgan Kaufmann Publishers Inc.,2007.
    [132] X. Wan, J. Yang, and J. Xiao. Towards an iterative reinforcement approach for simultane-ous document summarization and keyword extraction. In Annual Meeting-association ForComputational Linguistics, volume45, page552,2007.
    [133] D. Wang and T. Li. Document update summarization using incremental hierarchical cluster-ing. In Proceedings of the19th ACM international conference on Information and knowledgemanagement, CIKM’10, pages279–288, New York, NY, USA,2010. ACM.
    [134] D. Wang, T. Li, and M. Ogihara. Generating pictorial storylines via minimum-weight con-nected dominating set approximation in multi-view graphs. In AAAI,2012.
    [135] D. Wang, T. Li, S. Zhu, and C. Ding. Multi-document summarization via sentence-levelsemantic analysis and symmetric matrix factorization. In Proceedings of the31st annualinternational ACM SIGIR conference on Research and development in information retrieval,SIGIR’08, pages307–314, New York, NY, USA,2008. ACM.
    [136] D. Wang, M. Ogihara, and T. Li. Summarizing the differences from microblogs. In Pro-ceedings of the35th international ACM SIGIR conference on Research and development ininformation retrieval, SIGIR’12, pages1147–1148, New York, NY, USA,2012. ACM.
    [137] D. Wang, L. Zheng, T. Li, and Y. Deng. Evolutionary document summarization for disastermanagement. In Proceedings of the32nd international ACM SIGIR conference on Researchand development in information retrieval, SIGIR’09, pages680–681, New York, NY, USA,2009. ACM.
    [138] L. Wenjie, W. Furu, L. Qin, and H. Yanxiang. Pnr2: ranking sentences with positive andnegative reinforcement for query-oriented update summarization. In Proceedings of the22ndInternational Conference on Computational Linguistics-Volume1, COLING’08, pages489–496, Stroudsburg, PA, USA,2008. Association for Computational Linguistics.
    [139] M. White, T. Korelsky, C. Cardie, V. Ng, D. Pierce, and K. Wagstaff. Multidocument sum-marization via information extraction. In Proceedings of the first international conferenceon Human language technology research, HLT’01, pages1–7, Stroudsburg, PA, USA,2001.Association for Computational Linguistics.
    [140] M. White, T. Korelsky, C. Cardie, V. Ng, D. Pierce, and K. Wagstaff. Multidocument sum-marization via information extraction. In Proceedings of the first international conference onHuman language technology research, pages1–7. Association for Computational Linguis-tics,2001.
    [141] M. White, T. Korelsky, C. Cardie, V. Ng, D. Pierce, and K. Wagstaff. Multidocument sum-marization via information extraction. In Proceedings of the first international conferenceon Human language technology research, HLT’01, pages1–7, Stroudsburg, PA, USA,2001.Association for Computational Linguistics.
    [142] S. A. Winder and M. Brown. Learning local image descriptors. In IEEE Conference onComputer Vision and Pattern Recognition, CVPR’07, pages1–8. IEEE,2007.
    [143] M. J. Witbrock and V. O. Mittal. Ultra-summarization (poster abstract): a statistical approachto generating highly condensed non-extractive summaries. In Proceedings of the22nd annualinternational ACM SIGIR conference on Research and development in information retrieval,SIGIR’99, pages315–316, New York, NY, USA,1999. ACM.
    [144] L. Xie, J. Zeng, and W. Feng. Multi-scale texttiling for automatic story segmentation inchinese broadcast news. Information Retrieval Technology, pages345–355,2008.
    [145] S. Xu, L. Kong, and Y. Zhang. A picture paints a thousand words: a method of generatingimage-text timelines. In Proceedings of the21st ACM international conference on Informa-tion and knowledge management, CIKM’12, pages2511–2514, New York, NY, USA,2012.ACM.
    [146] R. Yan, L. Kong, C. Huang, X. Wan, X. Li, and Y. Zhang. Timeline generation throughevolutionary trans-temporal summarization. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing, EMNLP’11, pages433–443, Stroudsburg, PA,USA,2011. Association for Computational Linguistics.
    [147] R. Yan, L. Kong, Y. Li, Y. Zhang, and X. Li. A finegrained digestion of news webpagesthrough event snippet extraction. In Proceedings of the20th international conference com-panion on World wide web, WWW’11, pages157–158, New York, NY, USA,2011. ACM.
    [148] R. Yan, M. Lapata, and X. Li. Tweet recommendation with graph co-ranking. In Proceed-ings of the50th Annual Meeting of the Association for Computational Linguistics: LongPapers-Volume1, ACL’12, pages516–525, Stroudsburg, PA, USA,2012. Association forComputational Linguistics.
    [149] R. Yan, Y. Li, Y. Zhang, and X. Li. Event recognition from news webpages through latentingredients extraction. In Asia Information Retrieval Society, pages490–501. Springer,2010.
    [150] R. Yan, J.-Y. Nie, and X. Li. Summarize what you are interested in: an optimization frame-work for interactive personalized summarization. In Proceedings of the Conference on Em-pirical Methods in Natural Language Processing, EMNLP’11, pages1342–1351, Strouds-burg, PA, USA,2011. Association for Computational Linguistics.
    [151] R. Yan, X. Wan, M. Lapata, W. X. Zhao, P.-J. Cheng, and X. Li. Visualizing timelines:evolutionary summarization via iterative reinforcement between text and image streams. InProceedings of the21st ACM international conference on Information and knowledge man-agement, CIKM’12, pages275–284, New York, NY, USA,2012. ACM.
    [152] R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. Evolutionary timeline sum-marization: a balanced optimization framework via iterative substitution. In Proceedings ofthe34th international ACM SIGIR conference on Research and development in InformationRetrieval, SIGIR’11, pages745–754, New York, NY, USA,2011. ACM.
    [153] R. Yan, Z. Yuan, X. Wan, Y. Zhang, and X. Li. Hierarchical graph summarization: Leveraginghybrid information through visible and invisible linkage. In PAKDD (2), pages97–108,2012.
    [154] C. C. Yang and X. Shi. Discovering event evolution graphs from newswires. In Proceedingsof the15th international conference on World Wide Web, WWW’06, pages945–946, NewYork, NY, USA,2006. ACM.
    [155] K. Zhang, J. Zi, and L. G. Wu. New event detection based on indexing-tree and namedentity. In Proceedings of the30th annual international ACM SIGIR conference on Researchand development in information retrieval, SIGIR’07, pages215–222, New York, NY, USA,2007. ACM.
    [156] W. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E.-P. Lim, and X. Li. Topicalkeyphrase extraction from twitter. In Proceedings of the49th Annual Meeting of the As-sociation for Computational Linguistics: Human Language Technologies-Volume1, HLT’11, pages379–388, Stroudsburg, PA, USA,2011. Association for Computational Linguis-tics.
    [157] D. Zhou, S. A. Orshanskiy, H. Zha, and C. L. Giles. Co-ranking authors and documents ina heterogeneous network. In Proceedings of the Seventh IEEE International Conference onData Mining, ICDM’07, pages739–744. IEEE,2007.
    [158] X. Zhu and T. Oates. Finding story chains in newswire articles. In Information Reuse andIntegration (IRI),2012IEEE13th International Conference on, pages93–100. IEEE,2012.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700