基于特征提取的视频场景分类技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于特征提取的视频场景分类技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Video Scene Classification Techniques Based on Feature Extraction
作者：李凌
论文级别：硕士
学科专业名称：电子与通信工程
中文关键词：视频监控 ; 视频场景分类 ; 光流 ; LDA ; k-means聚类 ; 远程医疗 ; ActiveX
英文关键词：surveillance ; video scene classification ; optical flow ; LDA ; k-means clustering ; telemedicine ; ActiveX
学位年度：2012
导师：杨华
学科代码：081002
学位授予单位：上海交通大学
论文提交日期：2011-12-01

摘要

随着人们对于安全意识的提高,目前视频监控系统已经得到了广泛的应用。传统的视频监控是通过人工监看来实现监控目的,这种方法不仅浪费了大量的人力和时间,效率低下,另外这种方法依赖监控者的注意力和主观意识,分析的结果并不能保证绝对的正确。在这种背景下,对于视频进行智能分析的算法研究就显得尤为重要。视频场景分类是视频智能分析算法的一种,它是一种非常基础也是非常重要的智能视频分析算法。视频场景分类算法能够给视频监控工作提供非常重要的参考信息,从而极大的减少人工监控的工作量,提高监控结果的准确率。目前它主要用于协助人工监控、视频数据管理和为深层次的视频分析提供支持等方面。
     视频场景分类算法主要可以分为两类:基于跟踪的和基于特征提取的。基于跟踪的视频场景分类算法是一种传统的视频场景分类算法,当视频场景内运动目标过多时,由于目标遮挡和跟踪轨迹的复杂性,这种算法的使用效果会比较差。而基于特征提取的视频场景分类算法则很好的解决了这个问题。本文着眼于基于特征提取的视频场景分类算法,在总结和分析当前主流的基于特征提取的视频场景分类算法的基础上,进一步进行了以下方面的研究。
     首先,本文针对传统的光流量化方案的缺陷,提出了一种自适应的光流量化方案。光流矢量是最常用、同时也是目前效果最好的特征提取算法。传统的算法一般是采用简单直接的固定量化方法。这种方法没有考虑到光流分布的特性,对光流矢量中所包含的视频信息提取不充分。本文在深入研究这些的基础上,提出了一种自适应量化方法。这种方法能够针对光流位置和方向分布的不同区域,根据光流分布的特性,来选择疏密程度不一致的分割量化方案。实验证明,这种方案能够从原始的光流矢量中提取更多的视频信息,从而使得量化后的光流矢量更贴合原始的视频特征,从而有效的提升算法性能。
     其次,本文针对传统的k-means聚类的后处理算法,提出了一种改进的k-means聚类后处理算法。传统的k-means聚类算法直接使用在视频场景分类算法里存在两处缺陷,本文分别提出了改进方案。第一,传统的k-means聚类算法采用随机选择的方法来选择初始的聚类中心,这样会导致聚类结果和算法效率的不稳定性。本文针对这一点提出了一种基于统计的寻找聚类中心的算法,即通过原始数据的统计规律,寻找原始数据中数据间距离较大的几个数据点,并将这些数据点作为初始的聚类中心。这种方法可以有效的提高算法收敛的速度,提升算法性能。第二,传统的k-means聚类算法由于仅考虑欧式距离,导致聚类过程中容易发生球状簇聚类的现象。为了解决这个问题,本文在传统的欧式距离里引入了调节参数。调节参数通过分析整个数据的同一坐标轴下坐标值的分布特性,找出不同坐标轴对于全部数据距离的贡献比率差异,并利用这种差异来定义了一种带参数的欧式距离。使用新的欧式距离可以有效的避免球状簇聚类的缺陷,使得最终聚类结果更贴合人们对于数据聚类的自然认知。实验证明,改进后的k-means聚类算法能够避免球状簇聚类和性能不稳定的缺陷,从而提升算法的整体性能。
     最后,本文实现了一种基于ActiveX和网页的远程医疗系统的客户端。本文研究了生理参数的传输特性,设计并实现了一种简单高效的生理参数传输协议,这种协议可以确保数据的准确性和数据流在传输时的稳定性。在此基础上,本文设计并实现了生理参数接收和显示的客户端。客户端是基于ActiveX技术实现的,主要借助了MFC框架来实现。本文设计了一个单向的语音系统,它配合H.264视听系统可以组成双向语音系统。本文给出了这个语音系统的设计方案和具体的实现思路。此外,本文简单的设计了客户端系统的网页,包括用户管理和参数设置模块等。最后文章给出了整套系统的展示图并给出了视频延迟和码率的测试结果,证明了整套系统的有效性。
As the increasing awareness for public safety, currently the video surveillance system have been widely used. Traditional video surveillance system are performed by man-monitoring, which is not efficient, wasting valuable manpower and time. In another way, this way highly relies on the monitor’s attention and subjective awareness, which makes results not absolutely correct. On this background, algorithm researching on video analysis is particularly important. Video scene classification is one of the most important video analysis algorithm, which is one fundamental but also extremely important kind of video analysis algorithm. Video scene classification algorithm offers video monitoring important reference information, hence it greatly reduces the monitoring work and increases the accuracy of monitoring results. At present this technique is mainly used to assist man-monitoring work, video data management and provide support for other video analysis techniques.
     Video scene classification algorithm consist of two categories: tracking-based and feature extraction-based. Video scene classification algorithm based on tracking is a traditional video scene classification algorithm. When there are too many moving objects in the same scene, this algorithm performs not so good, for the blocking and the complexity of tracking trajectory. But the video scene classification algorithm based on feature extraction is a good solution to this problem. This paper focuses on video scene classification algorithm based on feature extraction, based on the summary and analysis of current popular algorithm, finish the further researching work as follows.
     Firstly, this paper proposes an adaptive optical flow quantization method, which is a perfect solution to traditional optical flow quantization methods’deficiency. Optical flow vector is most popular algorithm for feature extraction, which also performs best. Traditional algorithm generally use basic and direct fixed quantization method. This method do not fully consider the distribution characteristics of optical flow, losing some information in the optical flow vectors. Based on the researching mentioned above, this paper proposes an adaptive quantization method, which chooses different partition and quantization methods according to different distribution characteristics of optical flow. As the experiment shows, this method extracts more video information from the original optical flow vectors, making the quantitative optical flow vector closer with original video characteristics and increasing the algorithm’s performance.
     Secondly, this paper proposes a improved k-means clustering algorithm. There are two defects for traditional k-means clustering algorithm used for video scene classification, and this paper gives two solution for each one. In one hand, traditional k-means clustering algorithm randomly choose the initial clustering centers, which leads to instability in performance and efficiency of the algorithm. This paper proposes a statistics-based method to find the proper initial clustering centers. This method analyze the statistics characteristics of the original data, and it choose points that are far from each other as the initial clustering centers. This method increases the convergence speed and performance of the algorithm. On the other hand, traditional k-means clustering algorithm only consider the Euclidean distance, which usually leads to the globular clustering. To solve this problem, this paper introduces adjustable parameters in traditional k-means algorithm. This paper analyzes and finds the difference of different axis contribution for data-distances, and then define a new parameter-Euclidean distance. Clustering by using the new Euclidean distance avoids globular clustering, which is a perfect solution for traditional problems. As the experiment shows, these two methods highly increase the performance of the algorithm.
     Finally, this paper designs and implements the client for a telemedicine system based on ActiveX and web. This paper researches on the transferring characteristics of physiological parameters, and designs a novel transport protocol, ensuring the high accuracy and stability of the physiological parameters. Based on this, this paper designs and implements the client for physiological data receiving and displaying. This client is implemented based on ActiveX techniques, with the help of MFC framework. This paper designs a one-way voice sub-system, which is formed into a two-way voice system with H.264 system. This paper gives the ideas for design and implementation of the sub-system. Besides, this paper designs the webs for whole client system, including user management and configuration setting module. Finally, this paper shows the whole system and gives the video delay and bit rate performance experiments, which proves this system effective and feasible.

引文

[1] Junxian Wang; Bebis, G.; Miller, R.; , "Robust Video-Based Surveillance by Integrating Target Detection with Tracking," Computer Vision and Pattern Recognition Workshop, 2006. CVPRW '06. Conference on , vol., no., pp. 137, 17-22 June 2006.
    [2] Zhu, J.; Yuanwei Lao; Zheng, Y.F.; , "Object Tracking in Structured Environments for Video Surveillance Applications," Circuits and Systems for Video Technology, IEEE Transactions on , vol.20, no.2, pp.223-235, Feb. 2010.
    [3] Xiaogang Wang; Xiaoxu Ma; Grimson, W.E.L.; , "Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.31, no.3, pp.539-555, March 2009.
    [4] Kuettel, D.; Breitenstein, M.D.; Van Gool, L.; Ferrari, V.; , "What's going on? Discovering spatio-temporal dependencies in dynamic scenes," Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , vol., no., pp.1951-1958, 13-18 June 2010.
    [5] Sooyeong Kwak and Hyeran Byun, "Detection of dominant flow and abnormal events in surveillance video", Opt. Eng. 50, 027202 (Feb 11, 2011).
    [6] Lavee, G.; Rivlin, E.; Rudzsky, M.; , "Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on , vol.39, no.5, pp.489-504, Sept. 2009.
    [7] Bobick, A.F.; Wilson, A.D.; , "A state-based approach to the representation and recognition of gesture," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.19, no.12, pp.1325-1337, Dec 1997.
    [8] Bobick, A.F.; Davis, J.W.; , "The recognition of human movement using temporal templates," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.23, no.3, pp.257-267, Mar 2001.
    [9] Haritaoglu, I.; Harwood, D.; Davis, L.S.; , "W4: real-time surveillance of people and their activities," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.22, no.8, pp.809-830, Aug 2000.
    [10] Lipton, A.J.; Fujiyoshi, H.; Patil, R.S.; , "Moving target classification and tracking from real-time video," Applications of Computer Vision, 1998. WACV '98. Proceedings., Fourth IEEE Workshop on , vol., no., pp.8-14, 19-21 Oct 1998.
    [11] Patino, L.; Benhadda, H.; Corvee, E.; Bremond, F.; Thonnat, M.; , "Extraction of activity patterns on large video recordings," Computer Vision, IET , vol.2, no.2, pp.108-128, June 2008.
    [12] Naftel, A.; Khalid, S.; , "Motion Trajectory Learning in the DFT-Coefficient Feature Space," Computer Vision Systems, 2006 ICVS '06. IEEE International Conference on , vol., no., pp. 47, 04-07 Jan. 2006.
    [13] Gerhardt, L.; , "Pattern recognition and machine learning," Automatic Control, IEEE Transactions on , vol.19, no.4, pp. 461- 462, Aug 1974.
    [14] Shechtman, E.; Irani, M.; , "Space-time behavior based correlation," Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on , vol.1, no., pp. 405- 412 vol. 1, 20-25 June 2005.
    [15] Laijun Sun; Mingliang Liu; Haibo Qian; Guangzhong Ye; , "A New Method to Mechanical Fault Classification with Support Vector Machine," Intelligent System Design and Engineering Application (ISDEA), 2010 International Conference on , vol.2, no., pp.833-837, 13-14 Oct. 2010.
    [16] Yan, R.; Jie Yang; Hauptmann, A.; , "Automatically labeling video data using multi-class active learning," Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on , vol., no., pp.516-523 vol.1, 13-16 Oct. 2003.
    [17] Chun Zhu; Weihua Sheng; , "Online hand gesture recognition using neural network based segmentation," Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on , vol., no., pp.2415-2420, 10-15 Oct. 2009.
    [18] Hongeng, S.; Nevatia, R.; , "Multi-agent event recognition," Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on , vol.2, no., pp.84-91 vol.2, 2001.
    [19] Chaodhury, T.; Rehg, J.M.; Pavlovic, V.; Pentland, A.; , "Boosted learning in dynamic Bayesian networks for multimodal detection," Information Fusion, 2002. Proceedings of the Fifth International Conference on , vol.1, no., pp. 550- 556 vol.1, 2002.
    [20] Johansson, M.; Olofsson, T.; , "Bayesian Model Selection for Markov, Hidden Markov, and Multinomial Models," Signal Processing Letters, IEEE , vol.14, no.2, pp.129-132, Feb. 2007.
    [21] D. M. Blei, A. Y. Ng, and M. I. Jordan,“Latent Dirichlet allocation,”Journal of Machine Learning Research, vol. 3, no. 5, pp. 993-1022, 2003.
    [22] Y.w.Teh, M.1.Jordan, M.J.Beal, and D.M.Blei, "Hierarchical Dirichlet processes," Journal of the American Statistical Association, vol.!OI, no.476, pp.IS66-IS81, 2006.
    [23]王晖.视频图像的光流计算方法研究[D].国防科学技术大学,2007.
    [24]陶琳.基于光流技术的图像信息提取[D].华中科技大学,2005.
    [25]冯超.K-means聚类算法的研究[D].大连理工大学,2007.
    [26]孙昌思核.谱聚类算法研究和应用[D].杭州电子科技大学,2010.
    [27]陈疆.基于Web的远程医疗系统的研究与实现[D].中南大学,2008.
    [28]胡秉谊,白净,叶大田等.远程医疗系统的客户/服务器结构模型[J].清华大学学报(自然科学版),1999,(1):19-21.
    [29]胡秉谊,白净,叶大田等.远程医疗系统前端的设计和实现[J].仪器仪表学报,1999,20(3):235-237.
    [30]屈景怡.远程医疗系统的研究与实现[D].西北工业大学,2003.
    [31] Shechtman, E.; Irani, M.; , "Matching Local Self-Similarities across Images and Videos," Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on , vol., no., pp.1-8, 17-22 June 2007.
    [32] Honghong Liao; Jinhai Xiang; Weiping Sun; Qing Feng; Jianghua Dai; , "An Abnormal EventRecognition in Crowd Scene," Image and Graphics (ICIG), 2011 Sixth International Conference on , vol., no., pp.731-736, 12-15 Aug. 2011.
    [33] Hua Zhong; Jianbo Shi; Visontai, M.; , "Detecting unusual activity in video," Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on , vol.2, no., pp. II-819- II-826 Vol.2, 27 June-2 July 2004.
    [34] Patino, L.; Benhadda, H.; Corvee, E.; Bremond, F.; Thonnat, M.; , "Extraction of activity patterns on large video recordings," Computer Vision, IET , vol.2, no.2, pp.108-128, June 2008.
    [35] Bashir, F.I.; Khokhar, A.A.; Schonfeld, D.; , "Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models," Image Processing, IEEE Transactions on , vol.16, no.7, pp.1912-1919, July 2007.
    [36] Piciarelli, C.; Foresti, G.L.; Snidaro, L.; , "Trajectory clustering and its applications for video surveillance," Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on , vol., no., pp.40-45, 16-16 Sept. 2005.
    [37] Blank, M.; Gorelick, L.; Shechtman, E.; Irani, M.; Basri, R.; , "Actions as space-time shapes," Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on , vol.2, no., pp.1395-1402 Vol. 2, 17-21 Oct. 2005.
    [38] Chunhua Du; Qiang Wu; Jie Yang; Xiangjian He; Yan Chen; , "Subspace Analysis Methods plus Motion History Image for Human Action Recognition," Computing: Techniques and Applications, 2008. DICTA '08.Digital Image , vol., no., pp.606-611, 1-3 Dec. 2008.
    [39] Hao Jiang; Drew, M.S.; Ze-Nian Li; , "Successive Convex Matching for Action Detection," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2, no., pp. 1646- 1653, 2006.
    [40] Tahayna, B.; Alhashmi, S.; Yandan Wang; Abbas, K.; , "Combining content and context information fusion for video classification and retrieval," Signal Processing Systems (ICSPS), 2010 2nd International Conference on , vol.2, no., pp.V2-600-V2-604, 5-7 July 2010.
    [41] Dongwei Cao; Masoud, O.T.; Boley, D.; Papanikolopoulos, N.; , "Online motion classification using support vector machines," Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on , vol.3, no., pp. 2291- 2296 Vol.3, 26 April-1 May 2004.
    [42] Lee, D.; Yannakakis, M.; , "Principles and methods of testing finite state machines-a survey ," Proceedings of the IEEE , vol.84, no.8, pp.1090-1123, Aug 1996.
    [43] Kang-Hyun Jo; Kuno, Y.; Shirai, Y.; , "Manipulative hand gesture recognition using task knowledge for human computer interaction," Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on , vol., no., pp.468-473, 14-16 Apr 1998.
    [44] Fengjun Lv; Nevatia, R.; , "Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching," Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on , vol., no., pp.1-8, 17-22 June 2007.
    [45] Medioni, G.; Cohen, I.; Bremond, F.; Hongeng, S.; Nevatia, R.; , "Event detection and analysis from video streams," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.23, no.8, pp.873-889, Aug 2001.
    [46] Rabiner, L.R.; , "A tutorial on hidden Markov models and selected applications in speechrecognition," Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989.
    [47] Kulic, D.; Takano, W.; Nakamura, Y.; , "Representability of human motions by factorial hidden Markov models," Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on , vol., no., pp.2388-2393, Oct. 29 2007-Nov. 2 2007.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700