基于内容的视频结构挖掘方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于内容的视频结构挖掘方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Method of Video Structure Mining Based on Content
作者：付畅俭
论文级别：博士
学科专业名称：控制科学与工程
中文关键词：多媒体数据挖掘 ; 视频挖掘 ; 视频结构挖掘 ; 视频基本结构挖掘 ; 视频结构语法挖掘 ; 视频结构语义挖掘 ; 视频关联规则挖掘
英文关键词：Multimedia Data Mining ; Video mining ; Video Structure Mining ; Video Basic Structure Mining ; Video Structure Syntax Mining ; Video Structure Semantics Mining ; Video Association Rule Mining
学位年度：2008
导师：李国辉
学科代码：081103
学位授予单位：国防科学技术大学
论文提交日期：2008-09-01

摘要

多媒体技术的快速发展产生了大量的视频数据,迫切需要有效技术对其进行管理、解释和利用。本文利用数据挖掘的思想,从语法和语义两个方面,探索视频高层结构知识,挖掘视频结构中蕴含的、有价值的、可理解的语义信息和模式知识,用于视频数据库的组织与管理、基于内容的个性视频推荐、基于结构语法和语义的视频摘要等。论文的主要研究内容及创新点如下:
     (1)视频结构挖掘概念和方法的理论研究。在传统数据挖掘及多媒体数据挖掘的基础上,明确提出了视频结构挖掘,确定了视频结构挖掘的概念框架,并对视频基本结构挖掘、结构语法挖掘和结构语义挖掘等概念进行了规范界定。确定了视频结构挖掘的系统结构,由视频数据预处理、建立视频数据库、视频数据的多维分析、视频挖掘功能模块和视频挖掘界面五大部分组成。确定了视频结构挖掘的功能结构,包括数据预处理、基本结构挖掘、结构语法挖掘、结构语义挖掘、模式评估和知识表现六大组成部分,其中视频基本结构挖掘是结构语法和结构语义挖掘的基础,视频结构语法挖掘和视频结构语义挖掘相辅相成,相互促进。
     (2)基于内容的视频基本结构挖掘方法研究。针对视频基本结构挖掘中的两大核心内容,提出了镜头分割算法和场景分割算法,从而得到视频中包括帧、镜头、场景和节目本身的视频层次结构,实现视频结构化,为进一步挖掘隐藏在基本结构之中的结构语法和结构语义提供了有力保证。确定了视频基本结构挖掘框架,主要内容有镜头分割、关键帧提取、镜头特征提取和场景分割等。利用HSV颜色空间进行非等间距量化,提出自适应双直方图两次判别镜头分割算法。利用HSV颜色直方图、同构纹理(HTD)和边界直方图(EHD)计算镜头之间的相似性,基于镜头多特征聚类和基于竞争力,从合并和分割两个方面提出视频场景构造方法。对视频结构挖掘中的音频辅助进行了探讨,提出利用新闻视频中声纹特征进行新闻故事单元分割方法。
     (3)基于内容的视频结构语法挖掘方法研究。确定了视频结构语法挖掘框架,在镜头分割的基础上,提出改进的FSCL算法进行无监督镜头聚类,把视频流数据转化为符号序列。针对视频关联规则中项的次序相关、时间相关、没有明确事务概念的特点,对传统Apriori算法进行改进,提出基于时基窗计算支持度的视频关联规则挖掘算法,以关联规则频繁集探讨视频中周期性或半周期性的结构语法模式。语法模式识别常用方法有字符串匹配和字符串解析两种,针对字符串匹配的局限性,提出基于HMM的模式挖掘方法对高层视频事件进行解析,识别和定位篮球视频中的罚球事件。
     (4)基于内容的视频结构语义挖掘方法研究。提出三个层次和两层映射的视频结构语义模型,并以此探讨解决视频低层特征到高层语义(用户需求)之间的“语义鸿沟”的方法。在底层特征和用户需求之间,增加镜头层语义概念,形成三个层次。结合语义概念网络模型,建立视频镜头多概念判别随机场模型,实现底层特征到镜头层语义概念的映射,充分利用概念之间的相互作用,提高镜头层语义概念标注的精确度。利用结构语法挖掘中得到的语法结构知识,以镜头层语义概念线索作为观察值,建立HHMM模型,以事件推理的方式,实现镜头层语义概念到高层视频语义事件的映射。
     综上所述,论文主要工作集中在基于内容的视频结构挖掘,建立了视频结构挖掘的理论与框架,从视频基本结构、结构语法和结构语义三个层次探讨视频挖掘方法与应用,在理论和应用上都取得了一定的成果。这些成果不仅具有实践价值,也将对多媒体数据挖掘产生积极的影响。
Advances in multimedia technologies have yielded a vast amount of video data. The omnipresent video data calls for efficient and flexible methodologies to annotate, organize, store, and access video resources. Video mining has attracted much research interest in recent years. It is defined as the process of discovering the implicit and previously unknown knowledge or interesting patterns from a massive set of video data. By means of data mining, the higher-level structure knowledge of video is explored at two aspects of syntax and semantics in this thesis. The main content and innovations are as follows:
     (1) The theoretical research on the concepts and methods of video structure mining. Based on the theories of traditional data mining and multimedia data mining, the concepts of video structure mining are defined explicitly in this thesis. The video structure mining mainly includes basic structure mining, structure syntax mining and structure semantics mining. The basic structure mining is the base of the structure syntax mining and the structure semantics mining. The structure syntax mining and the structure semantics mining supplement each other. A system structure of video structure mining is proposed, which includes pre-processing of video data, establishing video database, multi-dimensional analysis of video data, video mining function module and video mining interface. A functional structure of video structure mining is proposed, which includes data preprocessing, basic structure mining, structure syntax mining, structure semantics mining, patterns evaluation and knowledge representation, etc.
     (2) The research on content-based video basic structure mining methods. In order to obtain the hierarchical structure, which includes frame, shot, scene and video program from video, a framework of the video basic structure mining is proposed. This framework includes shot boundaries detection, key frame selection, video shot feature extraction and video scene segmentation, etc. This thesis focuses on the algorithms of video shot boundaries detection and video scene segmentation, so as to structuralize video stream. By using HSV (Hue, Saturation, Value) color space to do quantification of unequal distance, an algorithm of video shot boundaries detection using adaptive threshold by two-histogram and twice-differentiation is proposed to partition a video into shots. Based on the similarities of HSV color histogram, homogeneous texture descriptor (HTD) and edge histogram descriptor (EHD) among video shots, two video scene construction methods are developed, the one is shot clustering approach based on multi-features and the other is shot segmenting approach based on force competition. This thesis also discusses audio assistance in video structure mining and presents a method of news story unit segmentation using the speaker identification by the voice feature in news videos.
     (3) The research on content-based video structure syntax mining methods. This thesis presents a framework for video structure syntax mining. Based on shot segmentation, an improved method of frequency sensitive competitive learning (FSCL) is put forward to achieve unsupervised shots clustering and transform video stream into symbol sequence. With regard to the characteristic such as item’s order correlation, time correlation and without explicit transaction concept in video symbol sequence, calculating support based on temporal window, improving traditional apriori algorithm, a video association rule mining algorithm is proposed to exploit the periodic or semi-periodic structure syntax pattern in videos by frequent set from the transformed cluster sequence.
     (4) The research on content-based video structure semantics mining methods. This thesis presents a video structure semantics model composed of three semantics levels and two inter-level mappings to bridge the semantic gaps between the low level features and the high level semantics. Between the low-level feature and high-level user’s demand, this model adds the shot semantics concept. From the mapping of the low-level feature to shot semantics concept, this thesis applies discriminative random fields (DRF) model to shot multi-concepts annotation, and puts forward multi-concepts discriminative random fields (MDRF) and generalized MDRF models to detect semantics concepts in video shot. In our system framework of higher layer semantics events mining, structure syntax knowledge is extracted from video structure syntax to decide the model structure, the shot layer semantic concepts cues are treated as models observations, and hierarchical hidden markov models (HHMMs) are built and trained to infer the events from the cues. Through the way of incident reasoning, it fulfills the mapping of shot layer semantic concepts to higher layer video semantic incident.
     This thesis focused on video structure mining based on content. It set up the theories and framework of video structure mining and explored the methods and application of video structure mining from three gradations, that is, video basic structure mining, video structure syntax mining and video structure semantics mining. It will not only bring positive influence on multimedia data mining, but also establish theoretical and practical values for other correlative researches.

引文

[1]李国辉,汤大权,武德峰.信息组织与检索.北京:科学出版社, 2003.
    [2] Pan Jia-Yu, Faloutsos Christos. Video Graph: A new tool for video mining and classification. JCDL, Roanoke, Virginia, USA, June 2001:116~117.
    [3]原野.基于数据挖掘的视频分类和检索研究.博士学位论文,西安交通大学,2003.
    [4] Cavet R., Volmer S., Leopold E., et al. Revealing the connoted visual code: a new approach to video classification. Computers & Graphics-Uk,2004. 28(3):361~369.
    [5] Ma Y. F., Zhang H. J. Motion pattern-based video classification and retrieval. Eurasip Journal on Applied Signal Processing,2003,(2):199~208.
    [6] Zhu X, Wu X, Elmagarmid A K, et al. Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans. Knowledge and Data Engineering, 2005, 17(5): 665~677.
    [7] Zhu X Q. Mining video associations for efficient database management. Multimedia Systems, 2003, 9(6):31~53.
    [8] http://dimacs.rutgers.edu/Workshops/Video/. 2005-03-04.
    [9] Chen Hu-Ching, Shyu Mei-Ling, Zhang Chengcui. Multimedia data mining for traffic video sequences. ACM SIGKDD, San Francisco, USA, 2001: 78~86.
    [10] Michael C.Burl. Mining patterns of activity from video data. ICDM, 2004.
    [11] Stauffer C, Eric W, Grimson L. Learning patterns of activity using real time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence ,2000 ,22 (8) :747~757.
    [12] Zhu Xingquan, Fan Jianping, Aref Walid G. ClassMiner: Mining medical video content structure and events towards efficient access and scalable skimming. Proc. ACM SIGMOD Workshop, Madison, WI, 2002: 9~16.
    [13]胡军涛.视频高层结构分析和挖掘.硕士学位论文,国防科技大学,2002.
    [14] Xie L X, Chang S-F, Divakaran A. Unsupervised mining of statistical temporal structures in video. Book Chapter in Video Mining, Kluwer Academic Publishers, 2003.
    [15]付畅俭,李国辉,武德峰.基于直方图高阶差分聚类的视频结构挖掘.第十三屇全国多媒体技术学术会议,宁波, 2004.10: 12~15.
    [16] fu Chang-jian, li Guo-hui, Dai Ke-xue. A Framework for Video Structure Mining. Fourth International Conference on Machine Learning and Cybernetics(ICMLC05),Guangzhou,china. 18-21 August 2005. 3:1524~1528.
    [17]付畅俭,李国辉.挖掘视频层次结构,改善视频管理.计算机应用研究,2007,24(6):155~157.
    [18] Fu Chang-Jian, Li Guo-Hui, Wu Jun-Tao. Video Hierarchical Structure Mining. International Conference on Communications, Circuits and Systems (icccas06). Guilin, Guangxi, China. June 25 - 28, 2006,2150~2154.
    [19] Fu Chang-jian, Li Guo-hui, Xu Xin-wen, et al. Mining Video Hierarchical Structure for Efficient Management and Access. Fifth International Conference on Machine Learning and Cybernetics(ICMLC06), Dalian, China. 13-16 August 2006,2:1013~1018.
    [20]罗忠祥.视频流中的人体运动提取与运动合成.浙江大学博士学位论文,2002,12.
    [21]马国兵,薛安克.数据挖掘技术在运动目标轨迹预测中的应用.计算机工程与应用, 2004,40(11):210~211.
    [22] Khalid S., Naftel A. Motion trajectory clustering for video retrieval using spatio-temporal approximations. Visual Information and Information Systems,2006,3736: 60~70.
    [23]李建中,王珊编.数据库系统.北京:电子工业出版社,2004.
    [24] Han Jiawei, Kamber Micheline著,范明,孟小峰等译.数据挖掘概念与技术.北京:机械工业出版社,2001.
    [25]李国辉,张军,汤大权.多媒体挖掘.第十屇全国多媒体技术学术会议,北京,2001.
    [26]李宗民,于广斌,刘玉杰.基于内容的视频检索关键技术研究.情报科学,2004,22(07):850~852.
    [27]肖鸿开,吴飞.视频内容分析与检索技术研究现状和未来发展趋势.广播与电视技术,2005,32(6):50~54.
    [28]严明,秦嘉杭.基于文本信息的数字视频检索研究.情报科学,2004,22(7):865~869.
    [29]郑宏.全文检索技术在视频素材检索中的应用.广播与电视技术,2002,29(3):100~104.
    [30]胡军涛,武德峰,李国辉.多媒体数据挖掘的体系结构与方法.计算机工程, 2003,29(9):149~151.
    [31] Chen S. C., Shyu M. L., Zhang C., et al. A multimedia data mining framework: Mining information from traffic video sequences. Journal of Intelligent Information Systems,2002. 19(1):61~77.
    [32] Za?ane Osmar R., Han Jiawei, Li Ze-Nian, et al. Multimediaminer: A system prototype for multimedia data mining. Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, 1998: 581~583.
    [33] Fayyad U M, Piatetsky-Shapiro G, Uthurusamy R. Summary from the KDD-03 panel-data mining:The next 10 years. SIGKDD Explorations,2003,5(2):191~196.
    [34] Flickner M., Sawhney H., Niblack W. Query byimage and video content: The QBIC system. IEEEComputer, 1995, 28(9):23~32.
    [35] Smith J. R., Chang S.-F. Visually searching the web for content. IEEE Multimedia Magazine, Summer, 1997, 4(3):12~20.
    [36] Wactlar H, Hauptmann A, Witbrock M. Informedia: News-on-demand experiments in speech recognition. In Proceedings of ARPA Speech Recognition Workshop. Arden House, Harriman, NY, Feb. 1996:18~21.
    [37] Mostefaoui A., Kosch H., Brunie L. Semantic based prefetching in news-on-demand video servers. Multimedia Tools and Applications,2002. 18(2):159~179.
    [38] Simeon J. Variations on multimedia data mining. In Proceedings of the International Workshop on Multimedia Data Ming, Boston, USA, Aug. 2000.
    [39] Kakimoto Mitsuru, Morita Chie, Tsukimoto Hiroshi. Data mining from functional brain images In: Proceedings of ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD) conference, Boston, USA, 2000:91~97.
    [40] Antonie Maria-Luiza, Zaiane Osmar R., Coman Alexandru. Application of data mining techniques for medical Image classification. MDM/KDD2000 , Boston, MA, USA, 2000.8.
    [41] Kitamoto Asanobu. Data mining for typhoon image collection. ACM SIGKDD conference, San Francisco, USA, 2001: 68~77.
    [42] Honda R., Wang S. A., Kikuchi T., et al. Mining of moving objects from time-series images and its application to satellite weather imagery. Journal of Intelligent Information Systems, 2002. 19(1):79~93.
    [43] Wijesekera Duminda, Barbara aniel. Mining Cinematic Knowledge Work in Progress [An Extended Abstract]. MDM/KDD2000. Boston,MA,USA, 2000:98~103.
    [44]曹加恒,舒风笛,张凯.基于多媒体数据库的数据挖掘系统原型.武汉大学学报, 2000, 46(5): 569~570.
    [45] Simeon J, Simoff, Osmar R Z. Report on MDM/KDD2000: The 1st International Workshop on Multimedia Data Mining. SIGKDD Explorations, 2001, 2(2): 103~105.
    [46] Rasheed Z. Video categorization using semantics and semiotics. Book Chapter in Video Mining, Kluwer Academic Publishers, 2003. http://dimacs.rutgers.edu/Workshops/Video/.
    [47] Han Jiawei, Fu Yongjian, Wang Wei, et al. DBMiner: A system for mining knowledge in large relational databases. In Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96),250~255.
    [48] Li Ze-Nian, Zaiane O.R., Yan Bing. C-BIRD: content-based image retrieval from digital libraries usingillumination invariance and recognition kernel. Ninth International Workshop on Database and Expert Systems Applications, 25~28 Aug 1998,361~ 366.
    [49] Matsuo Yuya, Shirahama Kimiaki, Uehara Kuniaki. Video data mining: extracting cinematic rules from movie. SIGKDD, 2003:24~27.
    [50] Lienhart R, Pfeiffer S. Video abstracting. Communications of the ACM, December 1997,40(12):55~62.
    [51] Zhu Xing-Quan, Wu Xindong. Sequential association mining for video summarization. Proc. of IEEE Int. Conf. on Multimedia & Expo (ICME 2003), Baltimore, MD, July 6-9, 2003,3: 333~336.
    [52] Pan Jia-Yu, Faloutsos Christos. VideoCube: A novel tool for video mining and classification. In Proceedings of the Fifth International Conference on Asian Digital Libraries (ICADL), 2002.
    [53] Pan Jia-Yu, Faloutsos Christos. GeoPlot: Spatial data mining on video libraries. CIKM’02, McLean, Virginia, USA, Nov.4-9, 2002.
    [54] Kim Shearer, Chitra Dorai, Svetha Venkatesh. Incorporating domain knowledge with video and voice data analysis in news broadcasts. MDM/KDD. Boston,MA,USA, 2000.
    [55] http://www.informedia.cs.cmu.edu/arda/vaceII.html.
    [56] http://www.merl.com/projects/VideoMining/.
    [57] R Radhakrishnan, Z Xiong, A Divakaran, et al. Generation of sports highlights using a combination of supervised and unsupervised techniques in the audio domain. IEEE PCM, Singapore, 2003.
    [58] A Divakaran, K Miyaraha, A Peker K, et al. Video mining using combinations of unsupervised and supervised learning techniques. SPIE Conference on Storageand Retrieval for Multimedia Databases, January 2004,5307:235~243.
    [59] K-S Goh, K Miyahara, R Radhakrishan, et al. Audio-visual event detection based on mining of semantic audio-visual labels. SPIE Conference on Storage and Retrieval for Multimedia Databases,2004,5307:292~299.
    [60] Lu Xiaoye, Ma Yu-Fei, Zhang Hong-Jiang, et al. An integrated correlation measure for semantic video segmentation. Proc. of IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, August, 2002.
    [61] Hua Xian-Sheng, Yin Pei, Zhang Hong-Jiang. Efficient video text recognition using multiple frame integration. International Conference on Image Processing (ICIP2002), Rochester, New York, Sep. 22-25, 2002.
    [62] Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, et al. A user attention model for video summarization. ACM Multimedia, Juan-les-Pins, France, December, 2002.
    [63] Hongjiang Zhang. Content-based video analysis, retrieval and browsing. Multimedia Information Retrieval and Management - Technological Fundamentals and Applications. D. Feng, W.C. Siu, and H. J. Zhang. (Etc.), Springer, 2002.
    [64] Wang Wei-Qiang, Gao Wen. Automatic segmentation of news items based on video and audio features. Journal of Computer Science and Technology, Mar 2002, 17(2): 189~195.
    [65]马宇飞,白雪生,徐光佑.新闻视频中口播帧检测方法的研究.软件学报, 2001, 12(3): 377~382.
    [66]姜帆,章毓晋.新闻视频的场景分段索引及摘要生成.计算机学报, 2003, 26(7): 859~865.
    [67]周小四,杨杰,朱一坦.用于监控智能报警系统的图像识别技术.上海交通大学学报,2002,36(4):498~501.
    [68]熊华,老松杨,吴玲琦,等. NewsVideoCAR:一个基于内容的视频新闻节目浏览检索系统.计算机工程,2000,26(11): 73~75.
    [69]樊昀.基于超图聚类的故事单元的抽取与分析.软件学报, 2003, 14(4):857~863.
    [70] Zhu X, Aref W, Fan J, et al. Medical video mining for efficient database indexing, management and access. Proc. 19th Int. Conf. Data Eng., 2003: 569~580.
    [71] Yeung Minerva, Yeo Boon-Lock. Segmentation of Video by Clustering and Graph Analysis. Computer Vision and Image Understanding,1998,71(1):94~109.
    [72] Rasheed Zeeshan, Shah Mubarak. A Graph Theoretic Approach for Scene Detection in Produced Videos. Multimedia Information Retrieval Workshop 2003 in conjunction with the 26th annual ACM SIGIR conference on Information Retrieval, 1 Aug 2003, Toronto, Canada.
    [73] Rui Y., Huang T. S., Mehrotra S. Constructing table-of-content for videos. Multimedia Systems, 1999. 7(5):359~368.
    [74] Zhang H. J., Kankanhalli A., Smoliar S. Automatic Partitioning of Full-Motion Video. Multimedia Systems, 1993,1(1):10~28.
    [75] Hampapur Arun., Jain R. C., Weymouth T. Production Model Based Digital Video Segmentation. Multimedia Tools and Applications,1995,1(1)9~46.
    [76] Li S. Z. Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method. IEEE Transactions on Speech and Audio Processing, 2000, 8(5): 619~625.
    [77] Ngo Tchong-wah, Zhang Hong-jiang, Pong Tting-chuen. Recent advances incontent based video analysis. International Journal of Image and Graphics,2001,1(3):445~468.
    [78] Jeroen Vendrig, Marcel Worring. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia,2002, 4 (4) : 492~499.
    [79] Hanjalic A., Lagendijk R.L., Biemond J. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Transactions on Circuits and Systems for Video Technology, 1999,9(4):580~588.
    [80] Tavanapong Wallapak, Jun-yu Zhou. Shot Clustering Techniques for Story Browsing. IEEE Transactions on Multimedia,2004,6(4):517~527.
    [81] Hanjalic Alan, L.Lagedijk Reginald, Biemond Jan. Automaticallly Segmenting Movies into Logical Story Units. Faculty of Information Technology and Sysytems information and Communication Theory Group.NEC Research Institute,1999.
    [82] Ngo Chong-Wah, Chuen Ting, Pong Hongjiang Zhang. On Clustering and Retrieval of Video Shots. ACM Multimedia 2001, Ottawa, Canada, September 30 - October 5, 2001:51~56.
    [83] Hanjalic A., Zhang H. J. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans. On Circuits and Systems for Video Technology, Dec.1999, 9(8): 1280~1289.
    [84] Jain A. K., Vailaya A., Xiong W. Query by Video Clip. Multimedia Systems: Special Issue on Video Libraries, 1999, 7(5):369~384.
    [85]熊华,胡晓峰.一种不需要经验参数的视频镜头自校正聚类方法.中国图象图形学报,2001,(3):243~249.
    [86] Smith John R., Chang Shih-fu. Visualseek: a fully automated content-based image query system. The Fourth ACM Int'l Multimedia Conf 96 Proc (ACM Multimedia 96), Boston, 1996: 87～98.
    [87] Yu H, Wolf W. A Visual Search System for Video and Image Databases. IEEE Int'l Conf. on Multimedia Computing and Systems(Ottawa, Canada), June, 1997: 517~524.
    [88] Pentl A., Picard R. W., Sclaroff S. Photobook: Content-Based Manipulation of Image Databases. the SPIE Conference on Storage and Retrieval of Image and Video Databases II,1994:34~47,.
    [89] Jain Ramesh. InfoScopes: Next Generation of Multimedia Information Systems. Multimedia Systems and Techniques, Kluwer Academic Publishers,1996.
    [90] Evers Marc, Nijholt Anton. Jacob - An animated instruction agent in virtual reality Third International Conference on Advances in Multimodal Interfaces table of contents,2000: 526 ~ 533.
    [91] Chua Tat-Seng, Lim Swee-Kiew, Pung Hung-Keng. Content-based Retrieval of segmented Images. ACM Multimedia , 1994.
    [92] Fischer S., Lienhart R., W. Effelsberg. Automatic recognition of film genres. The 3rd ACM International Multimedia Conference and Exhibition,1995, 1: 295~304.
    [93] Truong B T, Venkatesh S, Dorai C. Automatic genre identification for content-based video categorization. International Conference Pattern Recognition,2000, 4:230~233.
    [94] Chen Y, Wong E K. A knowledge based approach to video content classification. Proceedings of SPIE on Storage and Retrieval for Media Databases,2001: 292~300.
    [95] Zhou Wensheng, Dao Son, Kuo C.-C. Jay. On-line knowledge- and rule-basedvideo classification system for video indexing and dissemination. Information Systems,2002,27(8):559~586.
    [96] Shearer K, Dorai C, Venkatesh S. Incorporating domain knowledge with video and voice data analysis in news broadcasts. ACM International Conference on Knowledge Discovery and Data Mining,2000: 46~53.
    [97] Haering N C Qian R J, Sezan M I. . . A semantic event detection approach and its application to detecting hunts in wildlife video. IEEE Transaction on Circuits and Systems for Video Technology,2000, 10(6): 857~868.
    [98] Chang C W, Lee S Y. A video information system for sport motion analysis. Journal of Visual Languages and Computing,1998,8: 265~287.
    [99] Yow D, Yeo B L, Yeung M. Analysis and presentation of soccer highlights from digital video. Proc. Asian Conference on Computer Vision,1995: 499~503.
    [100] Brand J D, Mason J S D, Pawlweski M. Face detection in color images. International Conference Image Processing. 2000,24(5): 696~706.
    [101] Zhong D, Zhang H J, Chang S-F. Clustering methods for video browsing and annotation. Proc. of Storage and Retrieval for Image and Video Databases IV, San Jose, CA, USA, 1996, 2670: 239~246.
    [102]沈清,汤霖.模式识别导论.长沙:国防科大出版社, 1990.
    [103] Nakajima Y. A Video Browsing Using Fast Scene Cut Detection for an Efficient Networked Video Database Access. Ieice Transactions on Information and Systems,1994. E77d(12): 1355~1364.
    [104]薛峰.基于内容检索的图象和视频存储结构和索引技术的研究和实现.硕士学位论文,长沙:国防科学技术大学, 1999.
    [105] http://www.cis.temple.edu/~latecki/.
    [106] Oh JungHwan, Lee JeongKyu, Hwang Sae. Video data dining: current status and challenges. Book Chapter in Encyclopedia of Data Warehousing and Mining, Idea Group Inc. and IRM Press. 2005.
    [107] Divakaran A. Video mining using unsupervised clustering of video content. http://cxp.paterra.com/uspregrant20040085323.html, 2005-06-10.
    [108] http://166.111.247.12/research.htm.
    [109]曹莉华,胡晓峰,李国辉.基于内容检索中的视频处理技术研究.计算机工程与应用,1998,(6):39~41.
    [110] Kohonen T. Self-Organizing Maps. Spinger-Verlag,1995.
    [111] Huang C. L., Shih H. C., Chao C. Y. Semantic analysis of soccer video using dynamic Bayesian network. Ieee Transactions on Multimedia, 2006,8(4):749~760.
    [112] Mittal A., Cheong L. F. Framework for synthesizing semantic-level indices. Multimedia Tools and Applications, 2003,20(2):135~158.
    [113] Chen M-Y., Hauptmann A. Discriminative Fields for Modeling Semantic Concepts in Video Eighth Conference on Large-Scale Semantic Access to Content (RIAO'07), (Text, Image video and Sound), Pittsburgh, PA, May 30-June 1, 2007.
    [114] Bae T. M., Kim C. S., Jin S. H., et al. Semantic event detection in structured video using hybrid HMM/SVM. Image and Video Retrieval, Proceedings, 2005, 3568:113~122.
    [115] Adams W. H., Iyengar G., Lin C. Y., et al. Semantic indexing of multimedia content using visual, audio, and text cues. Eurasip Journal on Applied SignalProcessing, 2003,(2):170~185.
    [116]代科学,李国辉.一种基于Petri网的监控视频事件抽取方法.电视技术,2006(1):83~85.
    [117] Jiang Haitao, Hetal AbdelSalam. Scene change detection techniques for video database systems. Multimedia Systems, 1998,(6): 186~195.
    [118] Zhang H. J., Jianhua Wu, Di Zhong, et al. An integrated system for content-based video retrieval and browsing. Pattern Recognition, 1997, 30(4): 643~657.
    [119] Toller M S, Lewis, Nixon M S. Video segmentation using combined cues. Proc. SPIE, 1997, 3312:414~425.
    [120] Patel N. V., Sethi I. K. Video shot detection and characterization for video databases. Pattern Recognition, 1997. 30(4):583~592.
    [121] Zhang H. J., et al. Video parsing, retrieval and browsing: An integrated and content-based solution. ACM Multimedia'95, San Francisco, 1995, 15~24.
    [122] Moon-Ho Song S, Tae-Hoon Kwon. On detection of gradual scene changes for parsing of video data. SPIE, 1997, 3312:404~409.
    [123] Hwang H. C., Kim D. G. Shot detection from MPEG compressed video. Ieice Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2004. E87a(6): 1509~1513.
    [124] Yeo B. L. On fast microscopic browsing of MPEG-compressed video. Multimedia Systems,1999,7(4):269~281.
    [125] Gunsel B., Tekalp A. M., van Beek P. J. L. Content-based access to video objects: Temporal segmentation, visual summarization, and feature extraction. Signal Processing,1998,66(2):261~280.
    [126] Yeung M M, Yeo B L. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Transactions on Circuits and Systems for Video Technology, 1997, 7(5): 771~785.
    [127] wayne Wolf. Key frame selection by motion analysis. IEEE Int. Conf. On Acoustics, Speech and Signal Processing, ICASSP, Atlanta, 1996, 7~10.
    [128]荆其诚.色度学.北京:科学出版社, 1979.
    [129] Amir A, Berg M. IBM research TRECVID 2003 video retrieval system. TRECVID Workshop, USA :Washington D C ,2003.
    [130] Rui Y., Huang T.S. A uniform framework for video browsing and retrieval. Image and Video Processing Handbook, Academic Press, 2000,705~715.
    [131] Ngo C.W., Pong T.C., Zhang H.J. Motion-Based video representation for scene change detection. ICPR 2000. Barcelona, Spain, 2000.
    [132] Corridoni J.M., Bimbo A.D. Structured representation and automatic indexing of movie information content. Pattern Recognition, 1998,31(12):2027~2045.
    [133] Rui Y., Huang T.S., Mehrotra S. Exploring video structure beyond the shots. IEEE Conference on Multimedia Computing and Systems. 1998. 237~240.
    [134] Kender J.R., Yeo B.L. Video scene segmentation via continuous video coherence. IEEE International Conference on Computer Vision and Pattern Recognition. 1998. 367~373.
    [135] Manjunath B. S., Ohm Jens-Rainer, Vinod V. Vasudevan , et al. Color and Texture Descriptors. IEEE Trans. Circuits Syst. Video Technol., 2001,11:703~715.
    [136] Won C.S., Park D.K., Park S.J. Efficient use of mpeg-7 edge histogram descriptor. ETRI J. ,2002,24 (1)23~30.
    [137] Lin T., Zhang H., Shi Q. Video Scene Extraction by Force Competition. Proceeding of ICME2001 Conf., Tokyo, 2001,753~756.
    [138] L. Rabiner, B.H. Juang. Fundamentals of Speech Recognition. Prentice Hall: Englewood Cliffs, NJ, 1993.
    [139] D.A.Reynolds, R.C.Rose. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech and Audio Processing,1995,3(1):72~83.
    [140] D.A.Reynolds. Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 1995,17(1):91~108.
    [141] Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition Proc. IEEE, 1989, 77(2): 257～286.
    [142] Cover T. M., Tomas J. A. Elements of Information Theory. John Wiley & Sons, 1991,18~19.
    [143] Hoashi K., Sugano M., Naito M., et al. Shot Boundary Determination on MPEG Compressed Domain and Story Segmentation Experiments for TRECVID 2004. TREC Video Retrieval Evaluation Forum, 2004.
    [144] Hsu W., Chang S.F. Generative, Discriminative, and Ensemble Learning on Multi-Model Perceptual Fusion Toward News Video Story Segmentation. International Conference on Multimedia and Expo, 2004.
    [145] Chaisorn L., Chua T-S., Lee C-H. The Segmentation of News Video into Story Units. International Conference on Multimedia and Expo, 2002.
    [146] Zhai Yun, Yilmaz Alper, Shah Mubarak. Story Segmentation in News Videos Using Visual and Text Cues. W.-K. Leow et al. (Eds.): CIVR 2005, LNCS 3568:92~102.
    [147]肖鹏,吴玲达,老松杨,等.一种基于可视化的新闻视频挖掘方法.情报学报,2004,23(3):307~312.
    [148]文军,曾璞,徐建军,等.多模态特征融合的新闻视频故事分割方法.小型微型计算机系统,2008,29(1):171~174.
    [149] Sundaram H., Chang S. F. Computable scenes and structures in films. IEEE Transactions on Multimedia,2002. 4(4):482~491.
    [150] Cheng-Yu Wei, Dimitrova N., Shih-Fu Chang. Color-mood analysis of films based on syntactic and psychological models. IEEE International Conference on Multimedia and Expo, 2004,2:831~834.
    [151] Lienhart Rainer. Comparison of Automatic Shot Boundary Detection Algorithms. Storage and Retrieval for Still Image and Video Databases VII , 1999.
    [152] Duan L. Y., Jin J. S., Tian Q., et al. Nonparametric motion characterization for robust classification of camera motion patterns. IEEE Transactions on Multimedia, 2006. 8(2):323~340.
    [153] Gersho A., Gray R. M. Vector Quantization and Signal Compression. Norwell, MA, USA: Kluwer Academic Publishers,1992.
    [154] Tzu-Chao Lin, Pao-Ta Yu. A new unsupervised competitive learning algorithm for vector quantization. 9th International Conference on Neural Information Processing, 2002:944~948.
    [155] Lloyd S. P. Least-square quantization in pcm. IEEE Transactions on Information Theory, 1982. 28:129~137.
    [156] Linde Y., Buzo A., Gray R. M. An algorithm for vector quantization design. IEEE Transactions on Communications, 1980,28(1):84~95.
    [157] Ahalt S. C., et al. Competitive learning algorithms for vector quantization. Neural Networks, 1990. 3(3):277~290.
    [158] Agrawal R., Imielinski T., Swami A. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of data,1993,207~216.
    [159] Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. of the 20th VLDB, Aantiago Chile,September 1994:487~499.
    [160] Mannila H., Toivonen H., Verkamo A. Efficient algorithm for discovering association rules. AAAI Workshop on Knowledge Discovery in Databases, 1994,181~192.
    [161] J.Han, J.Pei, Y.Yin.Mining. Frequent patterns without candidate generation. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00),2000,1~12.
    [162] Park J. S., Chen M. S., Yu P. S. An effective hash-based algorithm for mining association rules. ACM SIGMOD International Conference on Management of Data,1995,175~186.
    [163] Lafferty J., McCallum A., Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. 18th International Conf. on Machine Learning, 2001,282~289.
    [164] McCallum A., Freitag D., Pereira F. Maximum entropy Markov models for information extraction and segmentation. ICML, Stanford, California, 2000,591~598.
    [165] Xie L, Xu P, Chang S F, et al. Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recogn. Lett., 2004, 25(7): 767~775.
    [166]曹建荣.一种基于语义的视频场景分割算法.中国图象图形学报,2006,11(11):1657~1660.
    [167] Smeaton A., Over P. Trecvid: Benchmarking the effectiveness of infomration retrieval tasks on digital video. Intl. Conf. on Image and Video Retrieval, 2003.
    [168] Natsev A. P., Naphade M. R., Tesic J. Learning the semantics of multimedia queries and concepts from a small number of examples. 13th Annual ACM Int'l Conf. on Multimedia,2005,598~607.
    [169] Snoek C. G. M., Worring M., Gemert J. C. van, et al. The challenge problem for automated detection of 101 semantic concepts in multimedia. 14th Annual ACM Int'l Conf. Multimedia, 2006,421~430.
    [170] Chen M., Chen S. C., Shyu M. L., et al. Semantic event detection via multimodal data mining. Ieee Signal Processing Magazine,2006. 23(2): 38~46.
    [171] Wu U., Tseng B.L., Smith J.R. Ontology-based multi-classification learning for video concept dtection. IEEE International Conference on Multimedia and Expo (ICME),2004.
    [172] Naphade M., Kristjansson T., Frey B., et al. Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems. fifth IEEE International Conference on Image Processing, vol. 3, Chicago, IL, Oct 1998, 536~540.
    [173] Kumar Sanjiv, Hebert Martial. Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR),2003.
    [174] Snoek C.G.M., M.Worring, Geusebroek J.M., et al. The mediamill TRECVID 2004 semantic viedo search engine. TRECVID, 2004.
    [175] Hauptmann A., Chen M.-Y., Christel M., et al. Confounded expectations: Informedia at trecvid 2004. TRECVID, 2004.
    [176] Yan R., Chen M.-Y., Hauptman A. Mining relationship between video concepts using probabilistic graphical models. IEEE International Conference on Multimedia and Expo (ICME),2006.
    [177] McCullagh P., Nelder J. A. Generalised Linear Models. Chapman and Hall, London, 1987.
    [178] Jolliffe I.T. Principal Component Analysis. Springer-Verlag, New York,2002.
    [179] Li S. Z. Markov Random Field Modeling in Image Analysis. Springer-Verlag, Tokyo, 2001.
    [180] W.R. Gilks, Richardson S., Speigelhalter D.J. Markov Chain Monte Carlo in Practice. Chapman and Hall, London,1996.
    [181] Minka T. P. Algorithms for Maximum-Likelihood Logistic Regression. Statistics Tech Report 758, Carnegie Mellon University, 2001.
    [182] Yedidia J.S., Freeman W.T., Weiss Y. Understanding belief propagation and its generalizations. Exploring articifical intelligence in the new millennium, Morgan Kaufmann Publishers Inc. San Francisco,2003,239~269.
    [183] Amir Arnon, Argillander Janne, Campbell Murray, et al. IBM Research TRECVID-2005 Video Retrieval System. NIST TRECVID-2005 Workshop, Gaithersburg, MD, November 2005.
    [184] Yan R., Hauptmann A., Chen M-Y. Mining Relationship between Video Concepts Using Probabilistic Graphical Model. IEEE International Conference On Multimedia and Expo (ICME'06), July 9-12, 2006.
    [185] Yang Jun, Yan Rong, Hauptmann Alexander G. Cross-domain video concept detection using adaptive svms. 15th international conference on Multimedia,Augsburg, Germany,2007:188~197.
    [186] Hauptmann A.G., Chen M.-Y. , Christel M., et al. A Hybrid Approach to Improving Semantic Extraction of News Video. International Conference on Semantic Computing,Irvine, CA, USA,17-19 Sept. 2007:79~86.
    [187] Chang Shih-Fu. Video Pattern Mining. Keynote Presentations on International Symposium on Intelligent Multimedia, Video & Speech Processing, Hong Kong, 2004.
    [188] Boreczky J S, Wilcox L D. A hidden Markov model framework for video segmentation using audio and image features. ICASSP'98, Seattle, May, 1998:3741~3744.
    [189] Wang Yuan-Kai, Chang Chih-Yao. Movie Scene Classification Using Hidden Markov Model. 16th IPPR Conference on Computer Vision, Graphics and Image Processing (CVGIP),2003.
    [190] Kijak E, Gravier G, Gros P, et al. HMM based structuring of tennis videos using visual and audio cues. 2003 Int'l Conf on Multimedia and Expo,Baltimore,2003.
    [191] P. Xu, L. Xie, F. Chang S., et al. A lgorithms and systems for segmentation and structure analysis in soccer video. IEEE International Conference on Multimedia and Expo. Tokyo, Japan: IEEE Press, 2001: 928~931.
    [192] V. Tovinkere, J. Qian R. Detecting semantic events in soccer games: towards a complete solution. IEEE International Conference on Multimedia and Expo. Tokyo, Japan: IEEE Press, 2001: 833~836.
    [193]金国英,陶霖密,徐光祐,等.基于HHMM的多线索融合和事件推理方法.清华大学学报(自然科学版),2007,47(1):112~115.
    [194] L Xie, F Chang S, A Divakaran, et al. Structure analysis of soccer video with hidden Markov models. IEEE International Conference on A coustics, Speech and Signal Processing, Orlando: IEEE Press, 2002, IV:4096~4099.
    [195] G Xu, F Ma Y, J Zhang H, et al. A HMM based semantic analysis framework for sports game event detection. IEEE International Conference on Image Processing. Barcelona: IEEE Press, 2003,125~128.
    [196] Fine S., Singer Y., Tishby N. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998,32(1):41~62.
    [197] Murphy Kelvin, Paskin M. Linear Time Inference in Hierarchical Hidden Markov Models. Advances in Neural Information Processing Systems, Cambridge: MIT Press, 2001.
    [198] Xie Lexing, Chang Shih-Fu, Divakaran Ajay, et al. Unsupervised Discovery of Multilevel Statistical Video Structures Using Hierarchical Hidden Markov Models. International Conference on Multimedia and Expo, Baltimore, USA, July 2003, 29~32.
    [199] L. Xie, S.-F. Chang, A. Divakaran, et al. Learning Hierarchical Hidden Markov Models for Video Structure Discovery. Technical Report ADVENT-2002-006, Dept. Electrical Engineering, Columbia Univ., Available at: http://www.ee.columbia.edu/?xlx/research/.
    [200] Hu M., Ingram C., Sirski M., et al. A hierarchical HMM implementation for vertebrate gene splice site prediction. Technical report, Dept. of Computer Science, University of Waterloo,2000.
    [201] Ivanov YA, Bobick AF. Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000,22(8):852~872.
    [202] Bach Nguyen Huu, Shinoda Koichi, Furui and Sadaoki. Robust Highlight Extraction Using Multi-stream Hidden Markov Models for Baseball Video. Proceedings of IEEE International Conference on Image Processing, Genova, Italy, 2005, 3:173~176.
    [203] Chang Peng, Han Mei, Gong Yihong. EXTRACT HIGHLIGHTS FROM BASEBALL GAME VIDEO WITH HIDDEN MARKOV MODELS. IEEE Int. Conf. on Image Processing, Sept. 2002:609~612.
    [204] Jin G Y, Tao L M, Xu G Y. Slow-motion replay detection in soccer videos based on multi-level HMM integrated with shot detection. Proceedings of WIAMIS,2005.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700