用户名: 密码: 验证码:
信息检索中信息需求域的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息检索作为信息获取的手段,是信息处理的重要内容,是当前信息处理研究领域中的研究热点。信息检索主要包括三个方面的内容:信息需求表达、文档表达和检索模型。其中,需求表达是一个重要的环节。只有当需求被正确地理解和表达时,才有可能得到好的检索结果;否则,如果不能很好地表达需求,再好的检索系统也无法得到好的检索结果。当前,机器检索本质上是基于关键词匹配的检索,将用户查询请求假定为是对用户信息需求的一个准确描述。但事实上,查询请求往往不能准确地描述用户的信息需求,这将不可避免地导致不太理想的检索结果。
     为了更好地描述和表达用户的信息需求,通常使用相关反馈的方法。相关反馈(包括用户相关反馈和伪相关反馈)方法试图从反馈的文档中寻找一组关联词项以增强用户的初始查询请求。实验显示,这种处理方法有一定的效果。然而,注意到这些词项的选取只是一种启发式的想法,通常假定用户的信息需求具有一个准确的描述。相关反馈方法试图利用反馈信息寻求用户需求的准确描述,但实际上用户需求的准确描述是难以得到的。这种利用相关反馈的方法进行的查询扩展只是对用户信息需求的一种猜想,并不准确。
     在本文中,我们采取了不同的方法。我们假定需求是一个语义范围。一开始,用户使用初始查询提出查询请求,当我们获得了一些反馈信息后(用户相关反馈或伪相关反馈),我们就能够建立一种对信息需求更好的描述,但这种描述不是试图去建立对信息需求的准确描述,而是概括性地去框定一个需求的范围。我们使用反馈信息建立需求的下界R和上界R,从而界定需求的一个范围。下界对应反馈文档的共有部分,上界对应反馈文档的全部内容。论文导出了需求的下界和上界,得到了需求域的两个边界,从而建立了需求域模型I=(R,R)。信息需求域具有以下特点:
     (1)信息需求域的下界表达了信息需求集中关注的内容,代表了信息需求的精度,也代表了信息需求的内涵;
     (2)信息需求域的上界包含了信息需求的延伸和扩展的内容,代表了信息需求的广度,也代表了信息需求的外延;
     (3)信息需求域较为松散地界定了用户信息需求的一个范围。
     论文中给出了建立需求域的两种机制:用户相关文档反馈机制和伪相关文档反馈机制。用户相关文档反馈机制要求用户从初始查询结果中标注反馈若干个相关文档,用这些文档建立信息需求域。伪相关文档反馈机制从初始检索结果中自动选取前n个(top n)文档,用这n个文档建立需求域。此方法的优点是自动化,无需用户参与,缺点是由于是伪相关文档反馈的结果,所反馈的文档不一定都是用户所需要的文档,因此,所得到的信息需求域是用户需求域的近似域。
     论文在需求域基础上,分析了文档相似度的计算方法,建立了需求域基础上的相似度计算模型。论文在通用的TREC测试集上通过一系列实验对所建立的模型进行了模型训练和分析,并进一步进行了一系列检索性能对比实验,以验证其有效性。在检索性能对比实验中,将所建立的需求域基础上的相似度计算模型与三种经典的模型(伪相关反馈语言模型Mixfb_kl_dir、伪相关反馈tf_idf模型Fb_tf_idf以及伪相关反馈概率模型Fb_okapi)进行了对比,对比实验结果显示,需求域基础上的相似度计算模型的检索性能得到了提高,表明所建立的模型是有效的,结果令人满意。
     与传统的方法相比较,传统的方法往往试图建立信息需求的一种准确的描述,而本文则是为信息需求建立一种较为松散的描述,使用需求域的方法去界定信息需求的一个范围。概况起来,论文研究的主要创新点为:
     (1)提出了用户信息需求域的概念,给出了确定信息需求域的方法;
     (2)提出了一种基于粗糙集的信息需求域的数学模型;
     (3)提出了一种基于信息需求域模型基础上的相似度计算模型。
     总之,论文研究的主要意义在于充实了信息需求的理论基础,并在此基础上建立了相应的相似度计算模型,提高了信息检索性能。从而为信息检索领域提供新的研究思路,充实新的理论和方法,并在实际应用中提高信息检索效率。
Information retrieval as a means of access to information is an important part of information processing and is the focus research area of information processing. Information retrieval requires three important aspects:specification of an information need, document description and retrieval model. Among them, the specification of information need is an important part of information retrieval. It is possible to produce good search results only when the information need is properly understood and expressed. At present, information retrieval is essentially implemented as a key words matching process, and user's query is assumed to be an accurate description of the user's information need. In reality, a user's query often cannot describe the underlying information need precisely. This unavoidably leads to unsatisfactory retrieval results.
     In order to improve the description of the query, relevance feedback is commonly used. This process tries to determine a set of related terms from the (pseudo-) relevant documents to enhance the user's original query. The experiments have shown that the process is effective. However, we observe that terms is usually performed using heuristics. It is noted that the selection of these words is a heuristic idea, and usually assume that the user's information need is an accurate description. Relevance feedback method attempts to use feedback to seek the accurate description of the user need. Indeed, it is usually impossible to arrive at the accurate description of the information need. The expanded query is only our best guess of the information need, which is still inaccurate.
     In this dissertation, we take a different approach. We assume that an information need is a semantic range. At the beginning, the only description of information need is the original query. When we get some feedback documents (user relevance feedback or pseudo-relevance feedback), we can build a better description of the information need, but this description is not trying to establish an accurate description of the information need, but frame a range of the information need. The feedback information can provide us with a lower bound R and an upper bound R:The lower bound corresponds to what a relevant document should contain (e.g. common terms shared by all the relevant documents), and the upper bound corresponds to what a relevant document may contain (e.g. all terms that appear in the relevant documents). The information need can be bounded with the domain Ⅰ=(R,R). The lower bound and upper bound of information need are derived in the dissertation, two boundaries of the domain are gotten, and the information need domain model Ⅰ=(R, R) is established.The information need domain has the following characteristics.
     (1) The lower bound of the information need domain expresses the core of information need that the user focuses on.
     (2) The upper bound of the information need domain contains the extended and extensive contents of information need, represents the breadth of the information need.
     (3) The information need domain loosely frames a range of user's information need.
     The dissertation uses two mechanisms to establish the information need domain:user true relevant document feedback and pseudo relevant document feedback. In the former case, a set of relevant documents identified by the user will be used to derive a description of R and R. In the second case, the top n documents from the initial retrieval results are assumed to be relevant. This method has the advantage to be automatic, but may include irrelevant feedback documents. So the resulting information need domain is only the approximation.
     Based on the information need domain, the dissertation analyses the document similarity calculation method and establishes a similarity model. The dissertation trains and analyzes the model through a series of experiments on standard TREC test corpora. The new similarity model based on the information need domain is compared with three classic models---pseudo-relevance feedback language model:Mixfb_kl_dir, pseudo-relevance feedback tf_idf model:Fb_tf_idf and pseudo-relevance feedback probability model:Fb_okapi. The experimental results show that the retrieval performances of the similarity model based on the information need domain are improved.
     Compared with traditional methods, traditional methods often attempt to establish an accurate description of the information need. We establish a loose description of the information need and using a domain to frame a range for the information need. In summary, the main contributions of the research work are:
     (1)We propose the concept of the information need domain for IR and provide the method to determine the information need of domain.
     (2)We propose the mathematical model of the information need domain based on fuzzy set.
     (3)We propose the similarity model based on the information need domain.
     The main significance of the research work is to establish and improve the theoretical basis of information need, and on this basis, to establish appropriate similarity model and to improve information retrieval performance. The information need domain provides a new research idea, enriches new theories and methods for the field of information retrieval, and improves information retrieval performance in practical applications.
引文
[1]R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval:The Concepts and Technology behind Search,2011,2nd edition. Addison-Wesley Professional.
    [2]Liddy, Elizabeth D. Automatic document retrieval. In Encyclopedia of Language and Linguistics,2005,2nd edition. Elsevier.
    [3]Mooers, Calvin E. Coding, information retrieval, and the rapid selector. American Documentation,1950, 1(4):225-229.
    [4]Sanderson, M and Croft, W. The history of information retrieval research. Proceedings of the IEEE,2012,100(13):1444-1451.
    [5]Gerhard Weikum, Gjergji Kasneci, Maya Ramanath, Fabian Suchanek. Database and information-retrieval methods for knowledge discovery. Communications of the ACM-A Direct Path to Dependable Software,2009,52(4):56-64.
    [6]Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom. Database Systems: The Complete Book.2008, Prentice Hall Press Upper Saddle River, NJ, USA.
    [7]Chiaramella, Yves, Philippe Mulhem and Franck Fourel. A model for multimedia information retrieval. Technical report, FERMI ESPRIT BRA 8134, University of Glasgow, Jul.1996.
    [8]Fuhr, Norbert, and Kai GroBjohann. XIRQL:An XML query language based on information retrieval concepts. TOIS,2004,22(2):313-356.
    [9]Lalmas, M. XML retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services,2009,1(1):1-111.
    [10]Mass, Yosi, Matan Mandelbrod, Einat Amitay, David Carmel, Yoelle S. Maarek, and Aya Soffer. Juru XML-An XML retrieval system at INEX'02,2003, pp. 73-80.
    [11]Jianwu Yang, Songlin Wang. Extended VSM for XML Document Classification Using Frequent Subtrees. Focused Retrieval and Evaluation Lecture Notes in Computer Science,2010,6203(2010):441-448.
    [12]Rongmei Li, Theo van der Weide. Language Models for XML Element Retrieval. Focused Retrieval and Evaluation Lecture Notes in Computer Science, 2010,6203(2010):95-102.
    [13]List, Johan, Vojkan Mihajlovic, Georgina Ramirez, Arjen P. Vries, Djoerd Hiemstra, and Henk Ernst Blok. TIJAH:Embracing IR methods in XML databases. IR,2005,8(4):547-570.
    [14]Ogilvie, Paul, and Jamie Callan. Parameter estimation for a simple hierarchical generative model for XML retrieval. In Proc. INEX,2005, pp.211-224.
    [15]Fatma Zohra Bessai-Mechmache, Zaia Alimazighi. Possibilistic model for aggregated search in XML documents. International Journal of Intelligent Information and Database Systems, Inderscience Publishers,2012,6(4): 381-404.
    [16]S Pohl, A Moffat, J Zobel. Efficient Extended Boolean Retrieval. Knowledge and Data Engineering, IEEE Transactions on,2012,24(6):1014-1024.
    [17]P. G. Anick, J. D. Brennan, R. A. Flynn, D. R. Hanssen, B. Alvey, J. M. Robbins. A direct manipulation interface for Boolean information retrieval via natural language query. In:Proceedings of the 13th annual international ACM SIGIR'89 conference on Research and development in information retrieval, 1989, pp.135-150.
    [18]A.G. Lopez-Herrera, E. Herrera-Viedma, F. Herrera. Applying multi-objective evolutionary algorithms to the automatic learning of extended Boolean queries in fuzzy ordinal linguistic information retrieval systems. Fuzzy Sets and Systems,2009,160(15):2192-2205.
    [19]PManolis Koubarakis, PSpiros Skiadopoulos, Christos Tryfonopoulos. Logic and Computational Complexity for Boolean Information Retrieval. IEEE Transactions on Knowledge and Data Engineering,2006,18(12):1659-1666.
    [20]Salton, G. and Lesk. M. E. Computer evaluation of indexing and text processing. Journal of the ACM,1968,15(1):8-36.
    [21]Salton, G. and Lesk, M. E. Computer evaluation of indexing and text processing. In:Gerard Salton, eds., The SMART Retrieval System:Experiments in Automatic Document Processing, Englewood Cliffs, New Jersey:Prentice Hall, Ing,1971, pp.143-180.
    [22]Zobel, Justin, and Alistair Moffat. Inverted files for text search engines. ACM Computing Surveys,2006,38(2):1-56.
    [23]Luhn, Hans Peter. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development,1957, 1(4):309-317.
    [24]S. Robertson. Understanding inverse document frequency:on theoretical arguments for IDF. Journal of Documentation 60,2004,503-520.
    [25]Sparck Jones, Karen. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation,1972,28(1):11-21.
    [26]George Tsatsaronis, Vicky Panagiotopoulou. A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness. Proceedings of the EACL 2009 Student Research Workshop,2009, pp.70-78.
    [27]Peter D. Turney, Patrick Pantel. From Frequency to Meaning:Vector Space Models of Semantics.Journal of Artificial Intelligence Research,2010, (37): 141-188.
    [28]Claire Fautsch, Jacques Savoy. Adapting the tf-idf vector-space model to domain specific information retrieval. Proceedings of the 2010 ACM Symposium on Applied Computing,2010, pp.1708-1712.
    [29]Papineni, Kishore. Why inverse document frequency? In Proc. North American Chapter of the Association for Computational Linguistics,2001, pp.1-8.
    [30]Xiaoying Tai, Minoru Sasaki, Yasuhito Tanaka, Kenji Kita, Improvement of vector space information retrieval model based on supervised learning, In Proceedings of the fifth international workshop on Information retrieval with Asian languages 2000, Hong Kong, China,2000, pp.69-74.
    [31]Stephen E. Robertson, Karen Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science.1976, 27(3):129-146.
    [32]Maron, M. E., and J. L. Kuhns. On relevance, probabilistic indexing, and information retrieval. JACM,1960,7(3):216-244.
    [33]Robertson, S. E., van Rijsbergen, C. J., and Porter, M. F. Probabilistic models of indexing and searching. In:Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. Butterworth, London,1980, pp.35-56.
    [34]Robertson, S. E., and Walker, S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In:Proceedings of the 17th Annual International ACM SIGIR'94 Conference on Research and Development in Information Retrieval,1994, pp.232-241.
    [35]Robertson, S. E., S. Walker, M. Hancock-Beaulieu and M. Gatford. Okapi at TREC-3. In:Proceedings of the Third Text REtrieval Conference (TREC-3) NIST Special Publication 500-225,1995, pp.109-126.
    [36]Robertson, S. E. and Walker, S. Okapi/Keenbow at TREC-8. In:Voorhees E. M. and Harman D. K., eds. In:Proceedings of the 8th Text REtrieval Conference, Gaithersburg, Maryland, NIST Special Publication 500-246,1999, pp.151-161.
    [37]N. Fuhr. Probabilistic Models in Information Retrieval, Computer Journal,1992, 35(3), pp.243-255.
    [38]Jinyoung Kim, Xiaobing Xue and W. Bruce Croft. A Probabilistic Retrieval Model for Semistructured Data. Advances in Information Retrieval Lecture Notes in Computer Science,2009,5478(2009):228-239.
    [39]Ponte J. M. and Croft W. B. A language modeling approach to information retrieval. In:Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia,1998, pp.275-281.
    [40]Berger, Adam, and John Lafferty. Information retrieval as statistical translation. In Proc. SIGIR'99,1999, pp.222-229. ACM Press.
    [41]Miller, David R. H., Tim Leek, and Richard M. Schwartz. A hidden Markov model information retrieval system. In Proc. SIGIR'99,1999, pp.214-221. ACM Press.
    [42]Croft, W. Bruce, and John Lafferty (eds.). Language Modeling for Information Retrieval.2003, Springer.
    [43]Zhai Chengxiang, and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. SIGIR'01, 2001, pp.334-342. ACM Press.
    [44]Hiemstra, Djoerd, and Wessel Kraaij. A language-modeling approach to TREC. In Voorhees and Harman (2005),2005, pp.373-395.
    [45]Cao, Guihong, Jian-Yun Nie, and Jing Bai. Integrating word relationships into language models. In Proc. SIGIR'05,2005, pp.298-305. ACM Press.
    [46]Yanyan Lan, Tie-Yan Liu, Zhiming Ma, Hang Li. "Generalization analysis of listwise learning-to-rank algorithms". In:Proceedings of the 26th Annual International Conference on Machine Learning,2009, pp.577-584.
    [47]Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. Learning to Rank using Gradient Descent. In:Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany,2005, pp.89-96.
    [48]G. Cao, J. Nie, L. Si, and J. Bai. Learning to rank documents for ad-hoc retrieval with regularized models. In SIGIR 2007 Workshop on Learning to Rank for Information Retrieval,2007.
    [49]Gao J., Qi H., Xia X., and Nie J. Linear Discriminant Model for Information Retrieval. In proceedings of the 28th Annual International ACM SIGIR'05 Conference on Research and Development in Information Retrieval, Sheffield, Salvador, Brazil,2005, pp.290-297.
    [50]Olivier Chapelle, Yi Chang, Tie-Yan Liu. Future directions in learning to rank. JMLR:Workshop and Conference Proceedings 14,2011, pp.91-100.
    [51]Cooper, W. S., Gey, F. C., and Dabney, D. P. Probabilistic Retrieval Based on Staged Logistic Regression. In:Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, pp.198-210.
    [52]Fredric C. Gey. Inferring probability of relevance using the method of logistic regression. In:Proceedings of 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland,1994, pp.222-231.
    [53]Nallapati, R. Discriminative Models for Information Retrieval. In:Proceedings of the 27th Annual International ACM SIGIR'04 Conferenee on Research and Development in Information Retrieval, Sheffield, United Kingdom,2004. pp. 64-71.
    [54]Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. Learning to Rank using Gradient Descent. In:Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany,2005, pp.89-96.
    [55]Herbrich, R., Graepel, T., and Obermayer, K. Large Margin Rank Boundaries for Ordinal Regression. Smola, In Advances in Large Margin Classifiers. MIT Press, Cambridge, MA,2000. MIT Press,2000, pp.115-132.
    [56]Joachims, T. Optimizing Search Engines Using Click-through Data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA,2002, pp.133-142.
    [57]http://svmlight.joachims.org/.
    [58]马晖男,吴江宁,潘东华.一种基于同义词词典的模糊查询扩展方法。大连理工大学学报,2007,47(3):439-443.
    [59]张敏,宋睿华,马少平.基于语义关系查询扩展的文档重构方法.计算机学报,2004,27(10):1395-1401.
    [60]J. Xu and W.B. Croft. Query Expansion Using Local and Global Document Analysis. In Proceedings of the Nineteenth Annual International ACM SIGIR'96 Conference on Research and Development in Information Retrieval, 1996, pp.4-11.
    [61]刘耕,方勇,刘嘉勇.基于关联词和扩展规则的敏感词库设计,四川大学学报(自然科学版),2009,(3):667-671.
    [62]Mostafa Keikha, Jangwon Seo, W. Bruce Croft, Fabio Crestani. Predicting document effectiveness in pseudo relevance feedback. Proceedings of the 20th ACM international conference on Information and knowledge management, 2011,pp.2061-2064.
    [63]J. J. Rocchio, Relevance feedback in information retrieval. The SMART Retrieval System,1971, pp.313-323.
    [64]Yuanhua Lv, ChengXiang Zhai, Wan Chen. A boosting approach to improving pseudo-relevance feedback. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval,2011, pp.165-174.
    [65]H.C.Wu, R.W.P.Luk, K.F.Wong, J.Y.Nie. A split-list approach for relevancefeedback in information retrieval. Information Processing & Management,2012,48(5):969-977.
    [66]Kalervo Jarvelin, Interactive relevance feedback with graded relevance and sentence extraction:simulated user experiments, In Proceeding of the 18th ACM conference on Information and knowledge management,2009, pp.2053-2056.
    [67]X. Shen and C. Zhai, Active feedback in ad hoc information retrieval. In Proceedings of the 28th Annual International ACM SIGIR'05 Conference,2005, pp.59-66.
    [68]Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31th Annual International ACM SIGIR'08 Conference,2008, pp.243-250.
    [69]Yuanhua Lv, ChengXiang Zhai. Positional relevance model for pseudo-relevance feedback. In Proceedings of the 33th Annual International ACM SIGIR'10 Conference,2010, pp.579-586.
    [70]Ramesh Nallapati, Bruce Croft and James Allan. Relevant Query Feedback in Statistical Language Modeling, In Proceeding of the 12th ACM conference on Information and knowledge management,2003, pp.560-563.
    [71]Ben He, Iadh Ounis. Finding Good Feedback Documents, In Proceeding of the 18th ACM conference on Information and knowledge management,2009, pp.2011-2014.
    [72]V. Lavrenko, W. B. Croft. Relevance-based language models. In Proceedings of the ACM SIGIR 2001, pp.120-127.
    [73]曹冬林,林达真.文本检索模型综述,心智与计算,2007,(4):426-432.
    [74]高炜,张超,梁立.信息检索排序算法研究综述,信息技术,2009,(6):1-4.
    [75]常鹏,冯楠,马辉.一种基于词共现的文档聚类方法,计算机工程,2012,38(2):213-214,220.
    [76]王德福.论叶尔姆斯列夫语符学的四个理论模型.锦州师范学院学报(哲学社会科学版),2003,25(5):55-59.
    [77]赵元任著.李芸,王强军译.语言的意义及其获取.语言文字应用,2001,(4):59-69.
    [78]Allan, Keith. Linguistic Meaning. London:Routledge & Kegan Paul,1986.
    [79]Lyons, J. Linguistic Semantics:an Introduction. Cambridge:Cambridge University Press,1995.
    [80]熊文新.信息检索Query语言分析[博士学位论文],北京,北京语言大学,2006.
    [81]Pawlak Z. Rough sets. International Journal of Computer Information Science, 1982, (5):341-356.
    [82]Pawlak Z. Rough sets and fuzzy sets. Fuzzy Sets and Systems,1985, (17):99-102.
    [83]王国胤,姚一豫,于洪.粗糙集理论与应用研究综述.计算机学报,2009,32(7):1229-1246.
    [84]Wang Biao, Gao Guanglai. Upper Nearness Degree and Lower Nearness Degree of Fuzzy-Rough Set. International Symposium on Knowledge Acquisition and Modeling,2008, pp.54-58.
    [85]王彪,高光来.一种粗糙集与模糊集的互补性理论与模型.计算机科学,2009,(11A):124-126,133.
    [86]张文修等编著.粗糙集理论与方法.北京:科学出版社,2001,4-8.
    [87]Christopher D.Manning, Hinrich schutze. Foundations of Statistical Natural Language Processing. MIT Press. Cambridge, MA,1999.
    [88]Markov, Andrei A. An example of statistical investigation in the text of Eugene Onyegin illustrating coupling of tests in chains. In Proceedings of the Academy of Sciences, St. Petersburg,1913,7 (6):153-162.
    [89]Zipf, G.K. Human Behavior and the Principle of Least Effort. Addison Wesley Press,1949.
    [90]C. E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal,1951,(30):50-64.
    [91]Christopher D. Manning, Prabhakar raghavan, Hinrich Schtitze(王斌译). Introduction to Information Retrieval. Cambridge University Press,2009.
    [92]Frederick Jelinek, Robert L.Mercer.Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands:North-Holland, May, 1980.
    [93]MacKay, David J.C.and Linda C.Peto. A hierarchical Dirichlet language model. Natural Language Engineering,1995,1(3):1-19.
    [94]Zobel, Justin, and Alistair Moffat. Inverted files for text search engines. ACM Computing Surveys,2006,38(2):1-56.
    [95]http://www.lemurproject.org.
    [96]Porter, Martin F. An algorithm for suffix stripping. Program,1980, 14(3):130-137.
    [97]Voorhees, Ellen M. and Donna Harman (eds.). TREC:Experiment and Evaluation in Information Retrieval. MIT Press,2005.
    [98]http://trec.nist.gov.
    [99]PEllen M. Voorhees, PDonna Harman. The text retrieval conferences (TRECS). In annual meeting of the ACL,1998, pp.241-273.
    [100]Kent, Allen, Madeline M. Berry, Fred U. Luehrs, Jr., and J. W. Perry. Machine literature searching Ⅷ. Operational criteria for designing information retrieval systems. American Documentation,1955,6(2):93-101.
    [101]van Rijsbergen, Cornelis Joost. Information Retrieval,2nd edition. Butterworths,1979.
    [102]Lafferty, John, and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. SIGIR'01, 2001, pp.111-119. ACM Press.
    [103]Zhai C. and Lafferty. Two-stage language models for information retrieval. In proceedings of the 25th ACM SIGIR'02 conference,2002, pp.49-56.
    [104]Mark D. Smucker, James Allan, Ben Carterette. A comparison of statistical significance tests for information retrieval evaluation. CIKM 2007, pp.623-632.
    [105]A. Nenkova and K. McKeown. Automatic summarization. Foundations and Trends in Information Retrieval,2011,5(2-3):103-233.
    [106]秦兵,刘挺,李生.多文档自动文摘综述,中文信息学报,2005,19(6):13-20.
    [107]龚书,瞿有利,田盛丰.基于语义的自动文摘研究综述.北京交通大学学报,2009,33(5):126-131.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700