用户名: 密码: 验证码:
基于局域网的信息推送系统
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着Internet/Intranet的迅速发展,基于WWW的网上信息的收集,发布及查询检索等事务处理为信息社会带来了全新的概念。于是基于Internet/Intranet的信息处理也就日益成为人们关注的焦点。在这种背景下,信息推送技术应运而生。推送技术的本质在于让信息主动的去寻找用户,因此其优势在于信息的主动性。通过使用该技术,可以主动的将信息推送到用户面前;其弱势在于信息的准确性比较差,由于简单的筛选机制取代了人工的选择,必然会使推送的信息和用户的需求之间存在一定的差异。推送技术在Internet上没有取得成功,原因是多方面的。对于网络信息提供商(ISP)来说,一方面因为它的用户类型太复杂;另一方面由于受到带宽的限制而无法成功的进行信息推送。而在一个单位或者部门,由于组内用户的兴趣是接近的,因此有实现信息推送的可能性。
     为此,我们对“基于局域网的信息推送系统”进行研究。所做的工作如下:首先,利用基于实例的方法得到了一个组的兴趣,并建立对应的兴趣模型;其次,利用现有的搜索引擎技术根据该组提交的查询关键词得到文档集。使用向量空间模型将用户的兴趣以及得到的文档表示成为向量形式{(k1,v2),(k2,v2)…(kn,vn)},并使用夹角余弦公式计算两者之间的相似度,将相似度最大的前N篇文档推送给用户。最后,在用户反馈单元,使用了算术平均和证据理论两种方法来处理用户的反馈。目的在于将组内所有用户的兴趣进行综合,得到对文档的一个综合评价,以便更好的修改初始的profile,以期提高推送的准确率。
     进一步的工作:1.尝试使用别的方法来计算群体兴趣以及文档中关键词的权重;2.检验是相似度值还是推送的文档数目作为过滤指标对提高用户的满意度更好;3.进一步完善基于局域网的信息推送系统的功能。
With the rapid development of Internet, the information collection, release and related retrieval transaction based on the Internet has brought new concepts for our world. Therefore, the transaction of Internet/Intranet has developed into the focus of people gradually. Under this background, the technology of push arises at this moment.The essence of push is to let the information find user. The advantage of push lies in its initiative.Through using the technology, the system can push information to users. It's disadvantage lies in the inaccuracy of information. Because simple screening mechanism has replaced the artificial choice, there must exist certain differences between the webpages obtained and the real demand of user. The push technology hasn't made enormous success in Internet, the reasons lie in many aspects. To ISP, it is too complicated because of the diversity of people on one hand; on the other hand, owing to the limit of bandwidth. Considering the similarity among users in an unit, it is possi
    ble to apply push technology in intraneto
    So, we do some research on the Push System based on Intranet(PSI). Our work: firstly, the system gained every group's interest based on examples and build up corresponding model; secondly, according to every group's query keywords, we get a set of documents using existing search engine (google, baidu). We use vector space model to denote the group's interest and the returning documents into vector {(k1,V2),(k2,v2)...(kn,vn)}, then we can calculate the similarity between them using the formula of cosine. The biggest former N pages will be pushed to group. Finally, in the feedback unit, we use arithmetical mean and D-S evidence theory to cope with the feedback of users in every group. The aim is to synthesize the users' interest and obtain a value. Thus we can update the initial profile bitterly and can improve the precision of push.
    Further work: 1. trying to calculate the weight of keyword in users' interest model and documents; 2. analyzing which index is better whether the number of documents push or the value of similarity; 3. perfecting the push system based on Intranet.
引文
[1] Seo Y. W, Zhang B. T: A reinforcement learning agent for personalized information filtering. http://lieber.www.media.mit.edu/people/lieber/IUI/Seo/Seo.pdf
    [2] Byoung W.S, Seo Y. W: Personalized web-document filtering using reinforcement learning. http://bi.snu.ac.kr/Publications/Joumals/Intemational/AAI15_7.pdf
    [3] Sun A.X, Lim E.P, Wee K.N: Personalized classification for keyword-based category profiles. Proceedings of 6th European Conference on Research and Advanced Technology for Digital Libraries(ECDL 2002), Rome, Italy, Sep, 2002
    [4] George C, Brian B: The foundation of information push and pull.http://actcomm.dartmouth.edu/papers/cybenko:push.ps.z
    [5] Dunja M, Marko G: Feature selection for unbalanced class distribution and Nave Bayes.
    [6] Ricardo B-Y, Berthier R-N: Modem Information Retrieval.
    [7] Han J. W, Micheline K: Data Mining: Concepts and Techniques. 高等教育出版社, 1995.
    [8] Yang Y. M, Liu X: A re-examination of text categorization methods, Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1999, pp 42—49
    [9] Andrew M. C and Kamal N: A comparison of event models for naive bayes text categorization, AAAI-98 Workshop on "Learning for Text Categorization", 1998
    [10] David D. L: Feature selection and feature extraction for text categorization, In Proceedings of Speech and Natural Language Workshop, pp 212-217. Defense Advanced Research Projects Agency, Morgan Kaufmann, February 1992
    [11] David D. Lewis and Marc R. A comparison of tow learning algorithms of text categorization, In Third Annual Symposium on Document Analysis and Information Retrieval, pp 81-93, Las Vegas, NV, April 11-13 1994. ISRI; Univ. of Nevada, Las Vegas
    [12] Yang Y. M: An evaluation of statistical approaches to text categorization, In Journal of Information Retrieval, 1999, Vol 1, No. 1/2, pp 67—88
    [13] Allan. J. Incremental relevance feedback for information filtering. Proc. Of SIGIR'96, 1996
    [14] Yan T. Y, Garcia M. H: A tool for wide-area information dissemination. USENIX Associated proceedings of the 1995 USENIX technical conference. Berkley, CA.
    [15] Moukas A. A: Information discovery and filtering using a multiagent evolving ecosystem. Applied Artificial Intelligence, 1997, 11(5), 437-457
    [16] Salton G, McGill M J: Introduction to Modem Information Retrieval. New York: McGraw-Hill, 1983
    [17] Salton G.: Automatic text processing. Addision-Publishing Company. Inc. Reading, MA, 1989.
    [18] Salton G, Allan J, Buckley C: Automatic structuring and retrieval of large text files. CACM, 1994, 37(2): 97-108
    [19] Nicholas J B, Bruce C W: Information filtering and information retrieval: two sides of the same coin. CACM, 1992, (12), 29-38
    [20] Sheth B, Maes P: Evolving agents for personalized information filtering. In Proceedings of the Ninth IEEE conference on AI for Applications, IEEE, New York, 1993, 345-352
    
    
    [21] Krulwich B, Burkey C: The inforfinder agent: learning user interests through heuristic phrase extraction. IEEE Expert, 1997, (12), 22-27
    [22] Sheth B.D: A learning approach to personalized information filtering. MIT, 1994
    [23] Gvert,N. and Lalmas,M. and Fuhr, N: A probabilistic description-oriented approach for categorizing Web documents. In Proceedings of CIKM-99,8th ACM International Conference on Information and Knowledge Management (Kansas City, MO, 1999), 475-482.1999.
    [24] Yang,Y.M, Liu X: A re-examination of text categorization methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99), 1999, 42-49
    [25] Porter M.F. An algorithm for suffix stripping. Program 14,3 1980,130-137
    [26] Sch ü tze H.,Hull D.A. Pedersen J.Q.,A comparison of classifiers and document representations for the routing problems.In Proceedings of SIGIR-95,18th ACM international Conference on Research and Development in Information Retrieval(Seattle,WA,1995),229-237
    [27] Yao Y.Y., Wong S.K.M. and Butz C.J: On Information-Theoretic Measures of Attribute Importance, 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PKDD'99), 1999,(28), 133-137
    [29] Aas K.,E: Text Categorization:A Survey. http://citeseer.nj.nec.com/aas99text.html
    [30] Terry R.R and Peter E and Claire L.G: Experience with rule Introduction and K-Nearest Neighbor methods for interface agents and learn[J]. IEEE Transactions on Knowledge and Data Engineering. 1997.9(2). 329-335
    [31] 鲁松、白硕等 文本中词语权重计算方法的改进 2000 International Conference on Multilingual Information Processing,2000,31-36
    [32] 庞剑锋,卜东波,白硕 基于向量空间模型的文本自动分类系统的研究与实现 计算机应用研究,Vol.18,No 9,2001.9 23-26
    [33] 沈艺 信息推送技术及其应用 计算机系统应用 1999年第5期 26-27
    [34] 彭国莉 信息推送技术与信息推送服务 信息技术 2001年第8期 49-50
    [35] 郝建武,李东生信息推送技术在网络中的应用 太原理工大学学报 2001年Vol.32,No.6 585-587
    [36] 牛伟霞,张永奎 潜在语义索引方法在信息过滤中的应用 计算机工程与应用 2001.9 57-59
    [37] 徐博艺,姜丽红 电子商务环境下信息过滤中用户偏好调整算法 计算机工程 Vol.27,No 10,2001.10 102-104
    [38] 林鸿飞,战学刚,姚天顺 文本结构分析与基于示例的文本过滤 小型微型计算机系统 2000年 Vol.21,No4 422-425
    [39] 李卫华 个性化网络信息过滤Agent的反馈评价机制 计算机工程与应用 2002.3 158-160
    [40] 李源,郑毅,何清,史忠植 基于概念空间的文本语义索引 计算机科学 Vol.29 No.1 2002
    [41] 田范江,李丛蓉,王鼎兴 进化式信息过滤方法研究软件学报 Vol.11 No.3 328-333
    [42] 吴立德,黄萱菁 复旦的文本检索http://www.iipl.fudan.edu.cn/research/fdtext.html
    [43] 黄萱菁,夏迎炬,吴立德 基于向量空间模型的文本过滤系统 软件学报 2002年 Vol.13,No.4
    [44] 史忠植著 知识发现 清华大学出版社,2002.
    [45] 边肇祺,张学工等编著 模式识别 清华大学出版社,2000.
    [46] 姚天顺,朱靖波等 自然语言理解——一种让机器懂得人类语言的研究 清华大学出版
    
    社,2002.
    [47] 朱明编著 数据挖掘 中国科学技术大学出版社,2002.
    [48] 刘惟一,田雯著 数据模型 科学出版社,2001.
    [49] 盛骤,谢式升,潘承毅 概率论与数理统计(第三版) 高等教育出版社,2001.
    [50] 王能超编 数值分析简明教程 高等教育出版社,1984.
    [51] 石纯一,黄昌宁等编 人工智能原理 清华大学出版社,1993.
    [52] 吴立德等 大规模中文文本处理 复旦大学出版社,1997.
    [53] 严蔚敏,吴伟民 数据结构(C语言版)清华大学出版社,1997.
    [54] 王晓庆 基于RBF网络的文本自动分类的研究 江西师范大学硕士学位论文
    [55] 张华平,刘群ICTCLAS的授权策略http://www.nlp.org.cn/docs/download.php?doc_id=111

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700