一种基于动态知识库的搜索引擎的技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

一种基于动态知识库的搜索引擎的技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research of Dynamic Knowledge Based Information Retrieval System
作者：李清
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：信息检索 ; 动态知识库 ; 查询扩展 ; 词的不匹配 ; 局部上下文分析法 ; 包含方法 ; 相似方法
英文关键词：IR(information retrieval) ; dynamic knowledge ; query expansion ; word mismatch ; LCA(Local Context Analysis) ; subsumption approach ; resemble approach
学位年度：2002
导师：王会进
学科代码：081202
学位授予单位：暨南大学
论文提交日期：2002-04-01

摘要

全文信息检索(IR)的主要任务是在一个海量文档集中查找与用户查询相关的文档子集。在一个检索系统的实现中需要解决两种与词汇相关的问题，即“忠实表达”问题和“表达差异”问题，这两类问题对系统的检索效率有很大的影响，目前虽已有各种的解决方案，但还是存在一些不足。本文在总结现在方法的基础上，讨论并实现了一个基于动态知识库的全文信息检索系统(DKIRS)，本系统试图通过自动构建的动态知识库解决这两类词汇问题，在一定程度上弥补了现有方法的一些不足之处。我们采用局部上下文分析法从检索结果中提取与查询相关的特征词，然后采用包含方法和相似方法判定这些特征词之间的相关关系，并利用所提取的概念与概念之间的关系以及用户的反馈信息动态地构造和更新知识库。每当用户检索时系统就从动态知识库中提取相关概念以实现查询的扩展。初步测试结果表明，本系统建立的知识库结构比较合理，在一定程度上改善了检索精度。
Information Retrieval (IR) is concerned with locating documents that are relevant for a user's query from a large collection of documents. There are two kinds of problems related to glossary in IR: one is "original expression"; the other is "difference of expression", which impact largely system's retrieval efficiency. Many approaches have been proposed to solve these problems, and each one has its advantages and disadvantages. On the basis of these approaches, we discuss and implement DKIRS (Dynamic Knowledge base Information retrieval System), which addresses these problems by carrying out the query expansion through dynamic knowledge. We adopt local context analysis to extract characteristic words from the documents retrieved by a user query, then apply subsumption approach and resemble approach in discovering terms relationships. The dynamic knowledge is composed of these extracted terms and term relationships, as well as user's feedback information. DORS expands user's query according to the dynamic kno
wledge. Experimental results show that the construction of the dynamic knowledge is reasonable and the retrieval efficiency is improved.

引文

[1] XU Solving the Word Mismatch Problem Through Automatic Text Analysis, Computer and Information Science Department, University of Massachusetts, Amherst, U.S.A, Mayl997 (来源于 http://citeseer. nj. nec. com/xu97solving. html)
    [2] Mark Sanderso, Bruce Croft Deriving Concept Hierarchies from Text, Proceedings on the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999, pp.206~213 ( 来源于 http://citeseer. nj. nee. com/sanderson99deriving. html)
    [3] John Broglio, James P.Callan, W.Bruce Croft INQUERY System Overview, University of Massachusetts, Amherst, U.S.A , 1994 ( 来源于 http://clteseer. nj. nec. com/broglio94inquery. html)
    [4] C.J.Van Rijsbergen B.Sc, Ph.D., M.B.C.S Information retrieval (second edition),Department of Computer Science, University of Glasgow, 1979 (来源于 http: //www. dcs. gla. ac. uk/Keith/Preface. html)
    [5] Micheline Beaulieu, Mark Sanderson. Concept_based-Interactive Query Expansion Support Tool (CIQUEST),Department of Information Studies, Sheffield University, 1999 (来源于 http://ir. shef. ac. uk/ciquest/proposal/)
    [6] Yi_Fang Wu Automatic Concept Hierarchies Development: A Revised Subsumption Approach, School of Information Science & Policy, State University of New York, Albany, 1999
    [7] John Broglio, James P.Callan, W. Bruce Croft and Daniel W. Nachbar Document Retrieval and Routing Using the INQUERY System, Text {REtrieval} Conference, 1994 (来源于 http://citeseer. nj. nee. com/broglio95document. html )
    [8] TH.P.VAN DER WEIDE, T.W.C. HUIBERS, P. VAN BOMMEL. The Incremental Searcher Satisfaction Model for Information Retrieval, The Computer Journal, 1998, Vol.41, No.5, pp.311~318 ( 来源于 http ://citeseer. nj. nec, com/223917, html)
    [9] Wai Lam, Member, IEEE, Miguel Ruiz, Padmini Srinivasan. Automatic Text Categorization and Its Application to Text Retrieval, IEEE Transactions on Knowledge and Data Engineering, November/December 1999, Vol.11, No.6,

    pp．865～879(来源于 http://citeseer.nj.nec.com/218519.html)
    [10] M.F. Porter An Algorithm for suffix stripping,Program,July 1980,Vol.14No.3,pp.130～137 (来源于http:///www.mis.yuntech.edu.tw/huangcm/study topic.htm)
    [11] 李蕾、王楠、张剑等，中文搜索引擎概念检索初探，计算机工程与应用，2000，6：PP．1～3
    [12] 罗三定、黄勇，一个基于具有自学习机制的概念网络的搜索引擎，计算机工程，2001，9，pp．89～92
    [13] 罗三定、黄勇，一个应用模糊方法的智能搜索引擎的构建，计算机工程，2000,Vol．12，PP．113～115
    [14] 丁永生、周斌、杨文春，HTML文档的模糊检索模型，计算机工程与应用，2000，Vol．3，PP．12～15
    [15] 培峰、杨季文、吕强、朱巧明，一个基于因特网的中文搜索引擎模型的实现，微计算机应用，2000，Vol．11，pp．325～329
    [16] 邓伟、张志伟、谭庆平、宁洪，一种新型的智能搜索引擎，计算机工程，2000，Vol．3，pp．8～10
    [17] 邹涛、王继成、杨文清、张福炎，文本信息检索技术，计算机科学，1999，Vol．26NO．9
    [18] 蒋晓冬、金宇晖、谈征，网上高质量智能信息检索系统的实现，计算机工程与科学，1999，Vol．21．No．4，pp．49～53
    [19] 蒋伟华、林亚平、黄灿灿，特殊搜索引擎中的文本分类研究，计算机工程，2001，Vol．5，pp．55～56[20] 齐向华，文本信息检索模型，晋图学刊，1998，Vol．3，pp．33～34
    [21] 金燕、李建华、杨宇航，WWW上的全文信息检索技术，计算机应用研究，1999，pp．40～43
    [22] 冯项云，检索系统中的相关反馈机制，情报理论与实践，1998，Vol．6，pp．321～323
    [23] Kenneth C. Louden(美)，编译原理与实践，(来源于http://ebook.0451.net/html/js86/38/bookl.htm)
    [24] 郭祥昊、李蕾，让搜索引擎“灵”起来，中国计算机报，出版日期：2000—04—17，总期号：913，本年期号：25
    [25] 王大玲、于戈、鲍玉斌、王国仁、刘斌武，基于概念层次树的数据挖掘算法的研究与实现，计算机科学，2001Vol．28No．6，pp．88～91
    [26] 应晓敏、窦文华，智能Web浏览器及其关键技术，计算机科学，2001Vol．28No．9，pp．29～33

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700