用户名: 密码: 验证码:
基于Hadoop的网络海量数据采集及处理平台开发
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目前随着移动网与互联网的融合加剧,用户使用的数据类业务日益丰富,并已经成为信息传递的主要方式。这些业务数据以IP数据报格式在互联网上传输,目前基于网管的网络质量指标并不能有效地根据用户的行为特性对业务进行管控,准确反映用户行为。针对这种情况,需要对IP包进行连续采集,研究用户行为特征分析体系及数据业务的规律评测与分析系统,提高网络对业务及用户特征的预测和感知能力,推动未来网络可控可管化发展。
     网络数据包的采集是实现这一需求的基础,对后续数据处理及用户行为特征的分析具有重大的意义,将进一步推动未来网络的可控可管化发展。
     随着网络数据采集工作的展开,数据量不断积累增多,海量的数据影响着处理系统的研究与设计,单靠单一数据库系统来完成所有的数据分析处理工作已不能满足实际的需要,因此,需要提高对数据的存储处理能力,满足大数据环境下对数据处理的要求。
     对数据进行准确地分析才能展现出数据的价值,服务于用户行为特征分析体系和未来网络的研究。这将有助于准确刻画网络的行为,指导实际的网络部署和实施有效的流量控制,推动面向服务的未来互联网体系结构与机制的研究。
     本文针对上述领域展开相应的研究,研究包括以下方面:(1)高速链路数据包捕获技术;(2)海量数据存储技术;(3)海量数据分析技术;(4)数据特征分析与展示。
With the integration of the mobile network and the Internet, different kinds of data service used by users have become the main way of information transfer. Those service data is transferred over the Internet by the way of IP datagram. At present, the network quality indicators based on NMS can not take control of service effectively according to the characteristics of user behavior or reflect the real user experience of various service. In this case, we need collect IP packet continuously, and then study the analysis system of user behavior characteristics, the law of data service, improve the predictive ability of the network about the user characteristics and promote the development of future network.
     Network packet capture is the core of this demand and is of great significance to follow-up analysis of data and the characteristics of user behavior.
     With the beginning of the network data collection, massive data rapidly emerges. It is a servere test to the resources of database servers. With the rapid increase of data resources, all data analysis and processing job to be completed by a single database system alone can not meet the actual needs. Therefore, we need to enhance capabilities of data processing to meet the data processing requirements of large data environment.
     Accuracy of the data analysis can reflect the value of the data and is good for the study of user behavior characteristics. Therefore, the study of the characteristics of the Internet data can help to portray the behavior of the network accurately and give guidance to the practical network deployment and traffic control, promoting the study of service-oriented future Internet architecture and mechanism.
     In this paper, we do our research on the areas metioned above, which include:(1) technology of high-speed link packet capture,(2) technology of massive data storage,(3) technology of massive data analysis and (4) data analysis and presentation.
引文
[1]C. Fraleigh et al., "Design and Deployment of a Passive Monitoring Infrastructure," Proc. Passive and Active Measurement Wksp., Amsterdam, The Netherlands, Apr.2001.
    [2]AMIR-EDDINE YOUSSOUF MADI被动测量的网络障排除和测试[D].中南大学,2011.
    [3]Antonio N, Konstantina P. Design, measurement & management of large-scale IP networks:bridging the gap between theory & practice[M]. New York:Cambridge University Press,2009:34-41.
    [4]艾有为,李巍,黄昕等.基于客户端对网络应用系统运行效率的研究[J].计算机工程,2003,29(18):186-188.
    [5]王磊.基于Linux的千兆网络数据包捕捉技术的研究与实现[D].山东大学,2007.
    [6]曹强,黄建忠,万继光等.海量网络存储系统原理与设计[M].武汉:华中科技大学出版社,2009.
    [7]邵林.高速海量数据存储技术研究[D].国防科学技术大学,2007.
    [8]陈琼,张江陵.高性能磁盘阵列I/O服务时间的分析[J].小型微型计算机系统,2000(3):235-237.
    [9]Gopalakrishnan K. Oracle Database 11g Real Application Clusters Handbook,Second Edition[M]. Beijing:Tsinghua University Press,2012.
    [10]李丙洋.涂抹Oracle[M]北京:中国水利水电出版社,2010.
    [11]刘宪军Oracle RAC llg实战指南[M].北京:机械工业出版社,2011.
    [12]张晓明.大话Oracle RAC[M]北京:人民邮电出版社,2011.
    [13]Hadoop. http://hadoop.apache.org/.
    [14]White T. Hadoop:The Definitive Guide[M]. Beijing:Tsinghua University Press, 2010.
    [15]Jeffrey Dean and Sanjay Ghemawat.2008. MapReduce:simplified data processing on large clusters. Commun. ACM51,1 (January 2008),107-113.
    [16]C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin:A not-so-foreign language for data processing. In Proc. ACM SIGMOD,2008.
    [17]Stevens W. TCP/IP Illustrated, Volume 1:The protocols[M]. Beijing:China Machine Press,2011.
    [18]肖宇峰,李昕,时岩Linux网络内核分析与开发[M].北京:电子工业出版社,2010.
    [19]张敦行.高速链路数据包捕获研究与实现[D].湖南大学,2007.
    [20]王亚.高速网络环境下数据包捕获技术的分析[J].数字技术与应用,2011,(12):194-195.
    [21]韩如冰.千兆网环境下数据包捕获技术研究[D].华中科技大学,2009.
    [22]陈卫屏.网络数据流高速采集系统设计与实现[D].电子科技大学,2009.
    [23]龙君芳.高速网络IPv6数据包采集的研究[J].电脑编程技巧与维护,2009,(12):62-63.
    [24]史志才,夏永祥.高速网络环境下的入侵检测技术研究综述[J].计算机应用研究,2010,27(5)
    [25]王凤宇,云晓春,申伟东等.高速IP网络流量测量系统的设计与实现[J].高技术通讯,2006,16(3):232-236.
    [26]寇应展,杨素敏,陈利军等.基于Libpcap网络数据包捕获技术的改进[J].军械工程学院学报,2011,23(3):49-51.
    [27]张宇雷,黄皓.基于网络处理器的零拷贝技术[J].计算机应用研究,2007,24(1):288-290.
    [28]王亚.基于PF_RING套接字的网络数据包捕获技术[J].软件导刊,2010,09(9):168-169.
    [29]方松茂Internet网络数据采集与数据包分析的应用与研究[D].贵州大学,2005.
    [30]Pavlo A, Paulson E, Rasin A, et al. A comparison of approaches to large-scale data analysis.In:Proceedings of the 35th SIGMOD international conference on Management of data, New York, NY, USA,2009
    [31]程莹,张云勇,徐雷等.基于Hadoop及关系型数据库的海量数据分析研究[J].电信科学,2010,26(11):47-50.
    [32]多雪松,张晶,高强.基于Hadoop的海量数据管理系统[J].微计算机信息,2010(13):202-204.
    [33]刘琨,李爱菊,董龙江.基于Hadoop的云存储的研究及实现[J].微计算机信息,2011(7):220-221.
    [34]朱珠.基于Hadoop的海量数据处理模型研究和应用[D].北京邮电大学,2008.
    [35]崔杰,李陶深,兰红星等.基于Hadoop的海量数据存储平台设计与开发[J].计算机研究与发展,2012,49(z1):12-18.
    [36]陈璐.基于云计算的海量数据存储技术的研究及应用[D].武汉科技大学,2011.
    [37]赵德玉Oracle数据库rowid深入探析[J].广西轻工业,2009,25(7):76-76,79.
    [38]黄雁.浅析Rowid在Oracle数据库中的应用[J].电脑知识与技术,2012,08(9):1979-1981.
    [39]Feuerstein S, Pribyl B. Oracle PL/SQL Programming[M]. Beijing:Posts & Telecom Press,2011.
    [40]ORACLE ROWID. http://blog.csdn.net/wh62592855/article/details/5081907.
    [41]周晓丹,冯少荣,薛永生等Oracle Bulk Binds技术分析[J].郑州大学学报(理学版),2007,39(4):36-39.
    [42]周晓丹.基于ORACLE RAC平台的海量数据DML处理性能的研究[D].厦门大学,2007.
    [43]李华植.海量数据库解决方案[M].北京:电子工业出版社,2011.
    [44]李君.互联网流量分类与识别方法研究[D].南京邮电大学,2009.
    [45]何海涛.因特网行为特性与流量分类研究[D].中山大学,2008.
    [46]秦董洪,杨家海Internet流量分类技术研究[C].//中国教育和科研计算机网CERNET第十七届学术年会论文集.2010:261-268.
    [47]Shengyong, D., L. Kunfeng and W. Dan. A Study on the Characteristics of the Data Traffic of Online Social Networks. in Communications (ICC),2011 IEEE International Conference on.2011.
    [48]Jaber, M., R.G. Cascella and C. Barakat. Can We Trust the Inter-Packet Time for Traffic Classification? in Communications (ICC),2011 IEEE International Conference on.2011.
    [49]Ichino, M., et al. Internet Traffic Classification Using LPC Cepstrum. in Communications (ICC),2011 IEEE International Conference on.2011.
    [50]Xiang, L., et al. An Internet Traffic Classification Method Based on Semi-Supervised Support Vector Machine, in Communications (ICC),2011 IEEE International Conference on.2011.
    [51]Hyunchul Kim, KC Claffy, Marina Fomenkov, Dhiman Barman, Michalis Faloutsos, and KiYoung Lee.2008. Internet traffic classification demystified: myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference (CoNEXT'08). ACM, New York, NY, USA,, Article 11,12 pages.
    [52]Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang.2010. Understanding data center traffic characteristics. SIGCOMM Comput. Commun. Rev.40,1 (January 2010),92-99.
    [53]杨铮.基于流量识别的网络用户行为分析[D].重庆大学,2009.
    [54]Hullar, B., S. Laki and A. Gyorgy. Early Identification of Peer-to-Peer Traffic.in Communications (ICC),2011 IEEE International Conference on.2011.
    [55]谢高岗,张玉军,李振宇等.未来互联网体系结构研究综述[J].计算机学报,2012,35(6):1109-1119.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700