用户名: 密码: 验证码:
网络敏感信息监控系统研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络技术的发展,互联网逐渐成为人们取知识和信息的必不可少的来源。网络的发展扩大,信息量剧增,其开放性也日益增强。与此同时,由于网络资源缺乏统一的管理,许多不健康甚至恶意的内容掺杂其中,网络犯罪活动的现象也日渐增多。我国在全国范围内多次开展了打击淫秽色情网站的专项活动,但是除了执法部门的打击之外,还需要利用信息技术手段将色情、暴力、反动等敏感信息过滤,净化网络环境。
     当前网络上敏感信息的类型和传播方式多种多样,单纯采用某一种过滤技术难以有效的遏止敏感信息的传播,本文选题即以此为背景,研究敏感信息的监控技术,主要监控含有色情、暴力、反动等敏感信息的文本和图像信息,对以文本匹配和基于内容的图像识别为主要手段的网络敏感信息监控的关键技术进行了研究,采用了两者相结合的方式,建立文本匹配模型和图像识别模型并设计监控系统原型。
     对文本信息的监控,本文根据当前网络文本信息数量大、隐蔽传播的特点,结合WM算法和模糊匹配的思想,在精确匹配的基础上实现模糊匹配,以识别包含敏感词汇的网络文本信息。首先针对网络上敏感文本信息常用的三种隐蔽传播方式,对文本进行预处理,将模糊匹配转换成另一种形式上的精确匹配;接着使用WM算法搜索匹配的敏感关键词;若搜索到网页中出现某一关键词后,再对网页文字进行模糊匹配,设置一个相似度的阀值,以最后判别网页中是否包含敏感文字信息。
     对图像信息的监控,本文根据色情图像本身的特点——有较多裸露肌肤,采用肤色检测和纹理判别相结合的方式识别肤色区域以生成掩码图像。肤色信息是图像中最直接、最丰富的信息,在肤色检测算法中,可以应用许多颜色空间。本文采用YUV与YIQ颜色空间相结合的方法,利用先验知识与规则将肤色模型建立在YUV颜色空间的相位角θ和YIQ颜色空间的I分量的阀值上来判断,进行肤色检测。经过肤色模型检测后,由于颜色的相似会产生不必要的误检,通过比较与分析,采用一阶灰度统计方法建立纹理判别模型,来判断图像中某点及其周围部分是否具有皮肤区域的光滑特性。最后,根据肤色检测和纹理检测生成的掩码图像从原图像中提取三个统计特征值,进行SVM分类器的训练和敏感图像识别。
     本文最后构建了一个具有信息反馈与控制功能的基于智能代理的网络敏感信息监控系统,系统采用网页文本信息匹配和敏感图像信息识别相结合的方式,根据网页中敏感信息的分布情况判断其是否含有敏感内容,然后将敏感信息识别情况反馈并对此进行处理和记录。监控系统原型在图像识别前先进行网页文本识别,缩短了过滤时间,使之更具有实时性。
     本文的创新之处是:图像识别过程中将肤色检测、纹理判别及SVM分类相结合,提高了图像识别的正检率;应用方面,设计的网络敏感监控系统将敏感信息的检测与智能代理相结合,而不是跟浏览器结合,能实现一定范围内的自动搜寻。
     本文的内容分为五章,第一章绪论,简要介绍了论文的研究背景、意义和内容;第二章分析了当前常用的网络敏感信息监控技术;第三章介绍了敏感文本信息的监控技术,并建立了文本匹配模型;第四章详细描述了敏感图像信息的监控技术,并建立了图像识别模型;第五章设计和实现了网络敏感信息监控系统的体系结构和原型。
With the development of network technology, the Internet has become one necessary source from which we obtain knowledge and information. The scale of net development is growing, the quantity of information we can get from Internet is elevating. However, there is a shortage of uniform management of it all, the proliferation of any kind and unrestricted information, the steady rise of criminal and pornographic use of it. Although we already have start against the nasty and porn website in our country, we also need technology filtrate the infective information such as information of erotic and violence as to guarantee a safe and healthy net environment.
     Currently, we can not forbid effectively the sensitive information to transmit if simply use one filter technique, because of the various types of the sensitive information and the forms of communication. So we take it as the background of the paper, and do research on the key technique based on text matching and image recognition. We combined the two techniques to establish the text-matching and image-recognition module, and design the sensitive information monitoring system.
     According to the large and covert characteristic of the current network text, in the sensitive text monitoring, we combined the WM algorithm and fuzzy matching to identify the text in which contain sensitive information. First of all, we pretreated the text aimed at the three hidden forms of text communication, and, it also converted fuzzy matching to another form of precise matching. Then, we searched the sensitive key words which appeared in the text used WM algorithm. If there were certain key words appeared in web, we used fuzzy matching and set a value of similarity in order to discriminate sensitive text information.
     According to the characteristic of more exposed skin, in the sensitive image monitoring, we use skin-color detection model and texture discrimination model to select skin area and build binary image. Complexion is the most direct and enrich information in images and there are many color space can be used in skin-color detection algorithm. We used the combination of YUV and YIQ color space to do skin-color detection. But the erotic images may be detected in error by the skin color model, because of the color's similarities. By comparison and analysis, we used the gray stat method to establish texture discrimination model, and judge whether a point and around part are smooth as skin area. Finally, we extract three characters as eigenvector to train SVM classifier and discriminate sensitive images.
     The sensitive information monitoring system is an intelligent agent-based system, it used the combination of text matching and image recognition to determine whether the web contain sensitive content, and then record and deal with it as feedback of information. Before the image recognition, the monitoring prototype will do text matching, which shortened the filter time and made the system more real-time.
     The innovation of the paper is that, we used the combination of skin-color detection, texture discrimination and SVM classification in the process, and improved the correctness of image recognition; and in application, the monitoring system is integrated with intelligent agent instead of browser, so it can search automatically in a certain rang.
     The paper is consisted by five chapter: We introduce the background, the significance and the content of this paper in chapter 1; analyze the technology currently about sensitive information monitoring in chapter 2; introduce detailed the technology about sensitive text monitoring and establish the text matching model in chapter 3;introduce detailed the technology about sensitive image monitoring and establish the image recognition model in chapter 4; finally, design and realize the prototype of the sensitive information monitoring system in chapter 5.
引文
[1]http://www.cnnic.net.cn/html/Dir/2008/01/17/4966.htm
    [2]孙春来,段米毅,毛克峰.基于内容过滤的网络监控技术研究.高技术通讯,2001,11(11):36-38
    [3]李笠,高速网络实时信息流监控系统:[硕士学位论文],云南:昆明理工大学,2002
    [4]网络爸爸:http://baba.tueagles.com
    [5]美萍反黄专家:http://www.mpsoft.net/shield.htm.
    [6]吴瑞,周广学.网上不良信息过滤系统研究.信息安全与通信保密,2005(8):104-106
    [7]严三九.论网络内容的管理.兰州大学学报(社会科学版),2002,1(5):67-72
    [8]http://zjnustd1.blogdriver.com/zjnustd1/1196699.html
    [9]俞文洋,张连堂,段淑敏.KMP匹配算法的研究.郑州轻工血压学报(自然科学版本),2007,22(5):64-66
    [10]R.S.Boyer and J.S.Moore,A fast string searching algorithm.,Comm.ACM197720(10):762-772
    [11]S.WU,U.Manber.A Fast Algorithm For Multi-Pattern Searching[R].Department of Computer Science.1994
    [12]Fleck mm,Forsyth DA,Bregler C.Finding naked people[A].In:Proceedings of the 4th European Conference on Computer Vision[C].Cambridge,UK,1996,2:593-602
    [13]颜色空间http://www.ekany.com/wdg98/cg/tutorial/chapter8/lessonS-6.htm,2005.3.10
    [14]韩海.在(r,g)和(Cr,Cb)彩色空间上进行肤色检测[J].计算机与现代化,2003,90(2):7-10.
    [15]姚鸿勋,刘明宝,高文等.基于彩色图像的色系坐标变换的面部定位与跟踪法[J].计算机学报,2000,33(2):158-165
    [16]段立捐,崔国秦,高文等.多层次特定类型图像过滤方法.计算机辅助设计与图形学学报,2002,14(5):404-409
    [17]范晓,申铉京.基于IE浏览器的色情图片过滤器[J].吉林大学学报(信息版),2004,22(6):631-637.
    [18]冯军红,刘桂林,高立新等.基于小样本训练集的肤色模型建立方法[J/OL].CNKI(http://cnki.jlu.edu.cn/cjfd/mainframe.asp?encode=&display=&navigate=),August.2003.
    [19]赵晓晖.基于内容的敏感图片过滤技术的研究及其在IE浏览器中的实现:[硕士学位论文].吉林大学,2006
    [20]Robertt M.Haralick,K.Shanmugam,and Its'hak Dinstein.Texture features for image classification.IEEE Trans.On Sys,Man,and Cyb,SMC-1973,3(6):610-621,
    [21]R.M.Haralick.Statistical and Structural Approaches to Texture.Proc.of IEEE.1989,67(5):45-69.
    [22]Hideyuki Tamura and Shunji Mori and Takashi Yamawaki,textural features correspondingtovisualperception,IEEETrans.Sys.Man,Cybern,vol.SMC-8,1988,6(6):460-473
    [23]MaWY,ManjunathBS.Texture features and learning similarity.In:Proc.IEEE:Computer Vision and Pattern Recognition,1996,425-430.
    [24]2003.Chang T.Kuo J.Texture analysis and classification with tree-structured waveletetransform.IEEE Transaction on Image Processing,1993,2(2):429-441.
    [25]吴相豪.基于肤色检测的敏感图像过滤器的研究与实现:[硕士论文].2003.2:19-22.
    [26]Duda,R.O.著.模式分类[M].北京:机械工业出版社,2003
    [27]J.P.Marques.著.模式识别——原理、方法及应用[M].北京:清华大学出版社,2002
    [28]M.Fleck,D.Forsyth,andC.Bregler.Finding Naked People.Proceedings of Fourth European Conference on Computer Vision,ambridge,K.1996:593-602
    [29]Daniel Sage,Franck R Neumann,Florence Hediger,etal.Automatic Tracking of Individual Fluorescence Particles:Application to the Study of Chromosome D)ynamics[J],IEEE Trans Image Processing,2005,14(9):1372-1383
    [30]Yixin Chen,James Z Wang,Robert Krovetz.Clue:Cluster-Based Retrieval of Images by Unsupervised Learning[J].IEEE Trans Image Processing,2005,14(8):1187-1199
    [31]Olvi.L.Mangasarian.David R Musicant.Robust Linear and support vector regression[J].IEEE Transaction on Pattern Analysis and Machine Intetelligence.2002,22(9):950-955
    [32]边肇祺,张学工.模式识别[M].第2版.北京:清华大学出版社.2000.
    [33]CJC Eurge.A tutorial on support vector machines foe pattern recognition[J].Data Mining and Knowledge Discovery.1998,2(2):955-974

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700