基于内容的视频检索关键技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于内容的视频检索关键技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Some Key Techniques of Video Retrieval Based on Content
作者：雷少帅
论文级别：博士
学科专业名称：电路与系统
中文关键词：视频检索 ; 镜头边界检测 ; 关键帧 ; 特征选取 ; 语义识别
英文关键词：video retrieval ; shot boundary detection ; key-frame extraction ; feature selection ; semantic recognization
学位年度：2012
导师：谢刚
学科代码：080902
学位授予单位：太原理工大学
论文提交日期：2012-04-01

摘要

随着视频信息广泛应用、数量迅速增加,如何对这些视频数据进行有效的组织和管理,已成为相当重要且富有挑战性的研究课题。由于视频物理结构的特殊性,传统的文本搜索方法不再适用。为将视频检索转化为文本检索,需对视频进行预处理。首先将视频分割为一个个独立的镜头,然后从每个镜头中选取若干关键帧,用关键帧来表示、代表镜头内容,最后将静态的视频帧标注为语义概念,从而将视频检索转化为对文本的操作本文围绕基于内容的视频检索中的几个关键技术进行了深入研究,具体包括镜头边界检测、关键帧提取、图像底层特征选取和图像语义识别。主要创新性工作包括如下几点：
     1、在镜头边界检测中,现有方法通过计算相邻两帧的差异确定镜头边界,但相邻帧的差异对闪光、物体和摄像机运动比较敏感,边界检测准确度低。本文提出了一种基于距离可分性的镜头边界检测方法,通过计算相邻两视频片段间的差异确定镜头边界,从而能有效抑制闪光、物体/摄像头运动,实现了闪光与切变,运动与渐变的有效区分
     2、现有关键帧提取方法缺乏对视频的时空分析,难以从整体上确定关键帧个数和关键帧位置。本文提出了两种关键帧提取方法,试图从视频时空特性的角度进行关键帧提取。第一种方法首先将镜头分割为若干个视频内容相似的子镜头,随后将每个子镜头的关键帧选取问题转化为矩阵的“最大线性无关组”求取问题,此方法可根据子镜头的内容变化快慢确定关键帧,以极低的冗余反映视频动态特性。第二种方法通过构造时空切片提取原视频的时空信息,然后通过聚类和规则定义实现关键帧的有效提取。两种关键帧提取方法均包含视频的时空特性,提取结果具有良好的人眼视觉感知。
     3、关键帧的作用包括构造视频摘要和提供视频片段索引。现有方法大都面向视频摘要的,其提取结果用于视频索引时,冗余过大,致使检索效率低下。本文提出了一种以视频索引为导向的关键帧提取方法,此方法通过研究摄像机运动方式和镜头表现手法实现关键帧的选取。首先构造了一种运动方向直方图,并以此为基础实现了摄像机全局运动的定性分析,最后结合全局运动的特点实现了关键帧提取。实验结果表明本方法能捕捉主要视频内容,为后期检索提供精简的索引结构。
     4、现有特征选择方法都没有把知识系统与分类紧密的联系在一起,不能够发现和推理各个数据特征间的关系,也不能有效地处理不一致、不完备信息,并从中发现隐含的知识,揭示出潜在的规律。本文把知识约简的思想应用到图像语义特征提取中,通过构建属性决策表,在知识不受影响的前提下对属性进行约简,可提取出有效底层特征集,为图像语义识别奠定基础。
     5、本文探讨了利用支持向量机(SVM)进行自然景观图像语义识别时的分类性能。由于SVM的分类性能由核函数及参数共同决定,因此本文分析了不同核函数和参数优化算法对自然景观图像语义分类性能的影响,最终在底层特征约简集的基础上利用优化后的SVM进行图像语义识别,取得了较高的识别准确率
The video is a continuous time series of imageframes, and is a image stream without data structure. If the video is seen a book without catalog and index, then an image frame is equivalent to one page of the book. Due to the lackness of catalog, people can not efficiently browse and retrieve. With the extensive application of video information, and the dramatic increase of vdieos, effective organization and management have been considered a very important and challenging research topic. This dissertation focuses on in-depth study several key technologies in the video retrieval based on content, Including shot boundary detection, key-frame extraction, image low-level features selection and image semantic recognition. The main innovations are summarized as follows:
     In shot boundary detection, existing methods calculate differences between two adjacent frames to get the shot borders, but the difference of adjacent frames is more sensitive to flash, object and camera motion. This dissertation presents a shot detection method based on distance separability criterion, by calculating the difference between two video clips in sliding window to determine shot borders, which can effectively suppress the flash and the object/camera motion. This method can effectively distinguish the flash and cut transition, the object/camera motion and gradual transition.
     Existing key frame extraction is lack of spatio-temporal analysis of a video. They are difficult to identify the number and the locations of the key frames as a whole. The first method first splits a shot into several sub-shots with similar visual content, followed by spatial and temporal analysis to identify key frames according to the change rate of video content. This method can effectively reflect the dynamic characteristics of the video. The second method first constructs a space-time slice to extract the spatio-temporal information of the original video, and then use K-mean clustering and some rules to achieve the effective extraction of key frames. The two methods contain spatio-temporal analysis, and the extraction result is consistent with human visual perception.
     The role of key frames includes constructing video summary and providing index of video clips. The existing methods are actually oriented to video summary, when the result is used for video indexing, the result is too redundant, resulting in low retrieval efficienc. This dissertation presents the idea that different extraction strategies should be applied to different applications, camera movementand lens performance techniques to extract key frames for videoindex. This dissertation presents a hierarchical camera motion classification algorithmon based on motion direction histogram, followed the basis of camera motion qualitative analysis, lens performance techniques were applied into the effective extraction of key frames. The experimental results show that the method can capture the main video information to provide a concise index structure for later retrieval.
     Existing feature extraction methods can not make a connection of knowledge systems and classification, and can not find and reason the relationship between the individual data, also can not effectively deal with inconsistent, incomplete information to find the implied knowledge, to reveal the potential laws. In this dissertation, knowledge reduction based on rough set theory is applied to the image semantic feature extraction. Under the premise of knowledge is not affected, by constructing attributes decision table and reducing the attributes, the effective low-level feature set can be extracted to lay the foundation for image semantic recognition.
     This dissertation investigates the classification performance of support vector machine (SVM) in the semantic recognition of landscape images. SVM classification performance is determined by the kernel function and parameters, therefore this dissertation analyzes the impacts of the different kernel functions and parameter optimization algorithms for semantic recognization performance of landscape images, and ultimately on the basis of effective low-level features set, optimized SVM was used to i to obtain a higher recognition accuracy.

引文

[1]Zhang X, Liu J, Li B. CoolStreaming/DONet:a data-driven overlay network for peer-to-peer live media streaming[A]. Proc.of IEEE INFOCOM'05, Mar 2005:2102-2111.
    [2]N.Dimitrova, H-J Zhang, B. Sahraray, et al. Applications of video content analysis and retrieval[J]. IEEE Multimedia,2002,9(3):42-55.
    [3]John, Boreczky, D Lynn. A hidden Markov model framework for video segmentation using audio and image featuresfJ]. Proceeding of IEEE Conf. Acoustics, Speech, Signal Processing, USA:IEEE,1988:3741-3744.
    [4]张寅,宋永红,杨蕾.利用图像不连续特性的溶解型镜头检测算法[J].计算机辅助设计与图形学学报,2011,23(5)：878-883.
    [5]Qi Y, Hauptmann, Liu AT. Supervised classification for video shot segmentation[C]. Proceedings of 2003 IEEE Inernational Conference on Multimedia & Expo, USA: IEEE,2003:689-692.
    [6]Y. Taniguchi. An Intuitive and Efficient Access Interface to Real-time Incoming Video Based on Automatic Indexing[C]. Proceedings of the third ACM international conference on Multimedia,1995:25-33.
    [7]Zhang Z, Wu J, Zhong D, et al. An integrated system for content-based video retrieval and browsing [J]. Pattern Recognit,1997,30(4):643-658.
    [8]A. M. Ferman, A. M. Tekalp. Two-stage Hierarchical Video Summary Extraction to Match Low-level User Browsing Preferences [J]. IEEE Transactions on Multimedia, 2003,5(2):244-256.
    [9]Z. Sun. K. Jia, H.Chen. Video Key Frame Extraction Based on Spatial-Temporal Color Distribution[C]. Conference on Intelligent Information Hiding and Multimedia Signal Processing,2008:196-199.
    [10]K. W. Sze, K. M. Lam, G. P. Qiu. A New Key Frame Representation for Video Segment Retrieval [J]. IEEE Transactions on Circuits and Systems for Video Technology,2005,15(9):1148-1155.
    [11]W. Wolf. Key frame selection by motion analysis[C]. Proc. of IEEE Int. Conf. Acoust., Speech Signal Proc.,1996, vol.2:1228-1231.
    [12]Toklu C, Liou SP. Automatic key frame selection for content-based video indexing and access[C]. Proceeding of Society of Photo-Optical Instrumentation Engineers, USA,2000:554-563.
    [13]Y. Zhuang, Y. Rui, T.S. Huang, et al. Adaptive key-frame extraction using unsupervised clustering[C]. Proc. IEEE Int. Conf. Image Processing, Chicago,1998: 886-870.
    [14]罗森林,马舒洁,梁静.基于子镜头聚类方法的关键帧提取技术[J].北京理工大学学报,2011,31(3)：348-352.
    [15]Huiyu Zhou, Abdul H. Sadka, Mohammad R. Swash. Feature extraction and clustering for dynamic video summarization[J]. Neurocomputing,2010,73(10): 1718-1729.
    [16]Huiyu Zhou, Abdul H. Sadka, Mohammad R. Swash. Feature extraction and clustering for dynamic video summarization[J]. Neurocomputing,2010, 73(10):1718-1729.
    [17]谢昭,高隽.基于高斯统计模型的场景分类及约束机制新方法[J].电子学报,2009,37(4)：733-738.
    [18]边肇祺,张学工.模式识别[M].北京：清华大学出版社(第2版),2000：176-178.
    [19]高志升,袁红照,杨军.融合CDI和LBP的人脸特征提取与识别算法[J].光电子·激光,2010,21(01)：112-115.
    [20]曾万梅,吴庆宪,姜长生.基于组合不变矩特征的空中目标识别方法[J].电光与控制,2009,16(7)：21-24.
    [21]朱旭锋,马彩文,刘波.基于特征级融合和支持向量机的飞机识别[J].光电子·激光,2011,22(11)：1710-1713.
    [22]李先锋,朱伟兴,孔令东等.基于SVM和D-S证据理论的多特征融合杂草识别方法[J].农业机械学报,2011,42(11)：164-168.
    [23]任会峰,阳春华,周璇.基于泡沫图像特征加权SVM的浮选工况识别[J].浙江大学学报(工学版),2011,45(12)：2115-2119.
    [24]杨佳佳,姜琦刚,陈永良等。基于最小二乘支持向量机和高分辨率遥感影像的大尺度区域岩性划分[J].中国石油大学学报(自然科学版),2011,36(01)：60-67.
    [24]Manoranjan Dash, Huan Liu. Feature selection for classification[J]. Intelligent Data Analysis,1997,1(3):131-156.
    [25]渠小洁.一种基于条件熵的特征选择算法[J].太原科技大学学报,2010,31(5)： 413-416.
    [26]孟洋,赵方.基于信息熵理论的动态规划特征选取算法[J].计算机工程与设计,2010,31(17)：3879-3881.
    [27]魏维,刘静,刘凤玉等.语义视频检索综述[J].计算机科学,2006,33(02)：1-7.
    [28]Chen Shu Ching, Shyu Mei Ling, Chen Min, et al. A decisiontree based multimodal data mining framework for soccer goal detection[C]. Proceedings of IEEE International Conference on Multimedia and Expo, Taipei,2004(1):27-30.
    [29]Huang C L, Shih H C, Chen C L. Shot and scoring events identification of basketball videos [C]. Proceedings of IEEE International Conference on Multimedia and Expo, Toronto,2006:1885-1888.
    [30]Snoek C G M, Worring M, Geusebroek J M, et al. The semantic pathfinder for generic news video indexing [C]. Proceedings of IEEE International Conference on Multimedia and Expo, Toronto,2006:1469-1472.
    [31]Izquierdo E. Knowledge based image processing for classification and recognition in surveillance applications[C]. Proceedings of IEEE International Conference on Image Processing, Atlanta, GA,2006:2377-2380.
    [32]M Rautiainen, J Penttila, D Vorobiev, et al. TREC 2002 Video Track Experiments at Media Team Oulu and VTT [J]. Proceedings of Text Retrieval Conference TREC 2002 Video Track, Baltimore, MD, USA,2002:417-428.
    [33]Koprinska I, Carrato S. Temporal Video Segmentation:A Survey [J]. Signal Processing:Image Communication,2001,16:477-500.
    [34]耿玉亮,须德,冯松鹤.一种快速有效的视频镜头边界检测方法[J].电子学报,2006,34(12)：2272-2277.
    [35]Smeaton AF, Over P, Doherty AR. Video shot boundary detection:Seven years of TRECVid activity[J]. Computer Vision and Image Understanding,2010.114(4): 411-418.
    [36]Li Jun, Ding Youdong, Shi Yunyu. Efficient Shot Boundary Detection Based on Scale Invariant Features[J]. Proceedings of the Fifth International Conference on Image and Graphics (ICIG 2009),2009:952-957.
    [37]Hanjalic A. Shot Boundary Detection:Unraveled and Resolved for Video Technology [J]. IEEE Trans on Circuits and Systems,2002,12(2):90-105.
    [38]Zhang HJ, Kankanhalli A,Smoliar SW. Automatic Partitioning of Full-motion Video [J]. Multimedia Systems,1993,1(1):10-28.
    [39]成勇,须德.一种自动选取阈值的视频镜头边界检测算法[J].电子学报,2004,32(3)：508-511.
    [40]方之听,孙锬锋,蒋兴浩.双因子自适应阈值的镜头边界检测算法[J].上海交通大学学报,2009,43(11)：1685-1688.
    [41]Lee H, Yu J, Im Y, et al. A unified scheme of shot boundary detection and anchor shot detection in news video story parsing[J]. Multimedia Tools and Applications, 2011,51(3):1127-1145.
    [42]DU Kuiran, XIAO Guoqiang and JIANG Jianmin. Shot Boundary Detection Algorithm Based on Multiple Video Features[J]. Computer Engineering,2009,35: 243-245.
    [43]Lian Shiquo. Automatic video temporal segmentation based on multiple features[J]. Soft Computing,2011,15:469-482.
    [44]Alan F. Smeaton, Paul Over, Aiden R. Doherty. Video shot boundary detection Seven years of TRECVID activity [J]. Computer Vision and Image Understanding, 2010,114:411-418.
    [56]Tianming Liu, Hong-Jiang Zhang, Feihu Qi. A Novel Video Key-Frame-Extraction Algorithm Based on Perceived Motion Energy Model[J]. IEEE Transactions on Circuits and Systems for Video Technology,200313(10):1006-1013.
    [57]Yanzhuo Ma, Yilin Chang, Hui Yuan. Key-frame extraction based on motion acceleration[J]. Optical Engineering Letters,2008, vol.47:090501-1-090501-3.
    [58]Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora. Feature-Based Video Key Frame Extraction For Low Quality Video Sequences[C]. Proceeding of 10th Workshop on Image Analysis for Multimedia Interactive Services,2009:25-28.
    [59]Gentao Liu, Xiangming Wen, Wei Zheng. Shot Boundary Detection and Keyframe Extraction based on Scale Invariant Feature Transform[C]. Proceeding of Eighth IEEE/ACIS International Conference on Computer and Information Science,2009: 1126-1130.
    [60]张贤达.矩阵分析与应用[M].北京：清华大学出版社(第1版),2004：354-357.
    [61]潘雪峰,李锦涛,张勇东等.基于视觉感知的时空联合视频拷贝检测方法[J].计算机学报,2009,32(01)：107-114.
    [62]Yosi K, Amir A. Fast gradient methods based on global motion estimation for video compression [J]. IEEE Trans Circuit Syst Video Technol,2003,13(4):300-309.
    [63]Qi B, Amer A. Robust and fast global motion estimation oriented to video object segmentation [C]. Pro IEEE Int Conf Image Process, Genoa, Italy,2005:153-156.
    [64]WANG Xing-mei, YIN Gui-sheng, MEN Zhi-guo. Global Motion Estimation Method with Adaptive Outliers Elimination in Dynamic Scene [J]. Journal of Nanjing University of Science and Technology,2011,35(4):442-447.
    [65]郑嘉利,谭团发,倪光南.结合率失真优化的自适应全局运动估计方法[J].中国图象图形学报,2011,16(8)：1346-1352.
    [66]陈正华,章毓晋.基于运动矢量可靠性分析的视频全局运动估计[J].清华大学学报,2010,50(4)：623-627.
    [67]X Q Zhu, X Y Xue, J P Fan, L D W u. Qualitative camera motion classif ication for content-based video indexing[A]. In Proc.3rd IEEE PCM, LNCS 2532 [C]. Taiwan:IEEE,2002:1128-1136.
    [68]L Y Duan, M Xu, Q Tian. Nonparametric motion model with applications to camera motion pattern classification [A]. In Proc. ACM Multimedia's 2004 [C]. NY, USA: ACM,2004:328-331.
    [69]Haralick RM, Shanmugram K. Texture features for image classification[J]. IEEE Trans.on System, Man and Cybernetics,1973,3(6):610-623.
    [70]李宗民.矩方法及其在几何形状描述中的应用[D].北京：中国科学院研究生院(计算技术研究所),2005.
    [71]Pawlak Z. Sets R. Theoretical aspects of reasoning about data [M]. Nowowiejska 15/19, Warsaw, Poland,1990:219-225.
    [72]冯为军.基于料糙集理论的数据挖掘算法的研究[D].哈尔滨：哈尔滨工程大学,2010.
    [73]Olivier C, Patrick H, Vladimir NV. Support vector machines for histogram-based image classification[J]. IEEE Transactions on Neural Network,1999,10(5): 1055-1064.
    [74]Wong S K M, Ziarko W. On optional decision rules in decision tables[J]. Bulletin of Polish Academy of Sciences.1985,33(11):693-696.
    [75]Pawlak Z. Rough sets[J]. International Journal of Computer and Information Science,1982,11(5):341-356.
    [76]刘少辉,盛秋戳,吴斌等Rough集高效算法的研究[J].计算机学报,2003,26(5)：524-529.
    [77]王国撤,于洪,杨大春.基于条件信息嫡的决策表约简[J].计算机学报,2002, 25(7)：759-766.
    [78]刘坤鹏,罗可.改进的模糊C均值聚类算法[J].计算机工程与应用,2009,21：101-102+192.
    [79]胡或,李智玲,李春伟.一种基于区分矩阵的属性约简算法[J].计算机应用,2006,26(S1)：80-82.
    [80]Deselaers T, Keeysers D, Ney H. Classification Error Rate for Quantitative Evaluation of Conten-based Image Retrieval Systems[C]. In Proc. of International Conference on Pattern Recogniton (ICPR 2004),2004:505-508.李盼池,许小华.
    [81]支持向量机在模式识别中的核函数特性分析[J].计算机工程与设计,2005,26(2)：302-304.
    [82]Sumita E, Iida H. Experiments and Prospects of Example-Based Machine Translation[C]. In Proceedings of the ACL,1991:65-72.
    [83]李琼,董才林,陈增照等.一种快速的SVM最优核参数选择方法[J],计算机工程与应用,2010,15：169-172.
    [84]徐海龙,王晓丹,廖勇等.一种基于PSO的RBF-SVM模型优化新方法[J].控制与决策,2010,03：50-53+60.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700