MV音乐视频的情感内容识别研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

MV音乐视频的情感内容识别研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Emotion Content Recognition in Music Video
作者：李蔚
论文级别：博士
学科专业名称：数字媒体技术与应用
中文关键词：音乐视频检索 ; 情感内容识别 ; 情感计算 ; 情感空间 ; 和弦 ; 动态纹理 ; 线性动态系统 ; 视频摘要 ; 图像视觉复杂度
英文关键词：Music Video Retrieval ; Emotiona Computation ; Emotion Recognition ; E-
英文关键词：motion Subspace ; Chords ; Dynamic Textures ; Linear Dynamic System ; Video Summa-
英文关键词：rization ; Visual Complexity
学位年度：2013
导师：张文俊
学科代码：081203
学位授予单位：上海大学
论文提交日期：2012-12-01

摘要

近年来，随着计算机网络技术和数字媒体处理技术的发展，数字化视频、图像、音频的数据量越来越庞大，其应用越来越普及。基于媒体信息语义内容的组织分类检索成为现在迫切需要解决的问题。但是，由于文化背景等差异，每个人对视听觉媒体的评判标准和感官存在着差异，特别是对媒体情感语义的理解。因此，情感认知识别的研究对于提升数字媒体的标注、检索以及数字娱乐产品的情感交互能力具有重要意义。
     情感是视频、图像的特征之一，是音乐的本质特征。本文以音乐视频媒体作为研究对象，从个人的情感认知角度出发，基于机器学习的方法用音乐视频的视听觉特征识别个性化情感内容，来弥合视听觉低层特征和人类情感高层语义之间的语义鸿沟。着重研究音乐视频训练集的构造与标注、情感模型与情感子空间的建立、视听特征及音乐乐理特征的提取、音乐视频个人情感识别以及音乐视频摘要的建立等。本文主要研究工作和创新点包括：
     1）用户音乐视频个性化情感子空间的建立。
     音乐视频是一种与个人情感偏好有很大关联的视听媒体，为了有效的表征个人情感，本文提出了可以表达个人离散和连续情感的诱力（Arousal）–激励（Valence）–偏好（Preference）心理学模型，采用了心理学反应量表（李克特量表，Likert scale）来标记情感值。为了更好的表现个人的个性化情感空间，采用有限学生t分布参数混合的KL模糊C均值聚类（Finite Mixture of student’s factoranalyzer with the Kuiiback-Leibler Fuzzy c-means,MSFA-KLFCM）来划分情感子空间，引用学生t分布混合模型（t-distribution mixture model,TMM）来估计情感子空间的隶属度，并确定划分的个性化情感子空间的有效性。实验结果表明，情感子空间的划分能够有效表示个体对音乐视频的个性化情感。
     2）音乐视频视听特征的提取。
     音乐视频的情感识别是基于其特有的视听觉特征。音乐是一种特殊的感性载体，音乐更是人类情感的表现，本文从音乐的乐理知识与音乐心理学出发，设计选择了一组情感视听特征。和弦作为高级的乐理特征能很好的表达音乐的情感，为此特别引入了高级乐理特征和弦直方图，并提出了新的和弦识别方法，即基于谐振时频图像（Resonator Time-Frequncy Image，RTFI）分析音乐时频的谱特性。同时根据和弦的泛音特性提出一种新的显著色度矢量特征，通过和弦模板期望最大的方法提取和弦。本文引入节拍特征进行后处理以提高识别的准确性。对比实验表明，该算法具有更加的识别准确性和鲁棒性。
     3）基于局部多核回归算法的个性情感识别。
     音乐视频的音频数据具有时间动态性，本文提出了提取音乐（梅尔倒谱、色度谱）的动态纹理模型，捕捉音乐特征的表征性和动态性。将整个音乐视为一个线性动态系统，用动态纹理的系统袋直方图来表示音乐的新特征用于音乐视频的情感识别。为了识别音乐视频的个性化情感内容，根据音乐视频的视觉特征和听觉特征的不同，提出采用局部多核回归（Localized Multiple Kernel Regression,LMKR）的方法识别个性化音乐情感的情感值。实验结果表明，结合系统袋直方图和和弦特征能够更有效地表示和识别个性化音乐视频的情感内容。
     4）基于图像视觉复杂度的音乐视频摘要的生成算法。
     本文针对音乐视频提出了一种基于视觉图像复杂度的提取关键帧生成静态视频摘要算法。首先对音乐视频进行子镜头分割检测；然后以镜头为基本单位，以图像视觉复杂度作为相似性机制来提取候选关键帧；最后基于镜头单位存在着信息的冗余，采用分层模糊C均值聚类算法对候选关键帧进行聚类，去除冗余的信息，按原有的时间顺序排列生成视频摘要。采用TRECVID客观评价标准对视频摘要进行评价。实验结果表明，使用本文视频摘要算法生成的视频摘要具有良好的压缩率、保真度、重构度。
     本文的研究工作是基于用户对音乐视频情感认知的应用需求而展开的，研究了音乐视频的视听觉特征与用户的情感之间的映射关系，从而可帮助用户在大量的视听媒体中更好地获取他们感兴趣的，且符合他们情感状态的音乐视频。同时，本文就音乐视频情感认知的研究成果，也为数字新媒体情感认知识别的研究与应用提供了新的思路与方法。
In recent years, with the development of computer network technology and digitalmedia processing technology, digital video, images, audio is growing and applicationsare becoming increasingly popular. The organization and retrieval based on the semanticcontent of the media information retrieval become an urgent problem to solve. However,due to diferences in culture background, everyone has diferent criteria of audio-visualmedia and their feelings are all diferent, especially in the media emotional semanticunderstanding. Therefore, the research on the emotional cognitive identified researchhave an important meaning to enhance the efcency of the digital media annotation,retrieval, and digital entertainment products emotional interaction ability.
     Emotion is one of the characteristics of the video, image, and the basic character-istics of music. In this paper, from the individual’s emotional cognitive perspective, thepersonalization emotional content of music video is recognized with a machine learningmethod from visual and auditory features of the music video, to bridge semantic gap be-tween the visual and auditory low-level features and high-level of human emotion seman-tics. Focus on the structure of the training set of the music video annotation, emotionalmodel with emotion subspace establishment of audio-visual features and music theoryfeature extraction and music video personal emotion recognition, and the establishmentof music video summarization. The main research work and innovations include:
     Firstly, the personal afective subspace of user about music video is established.The music video is a kind of audio-visual media with great relevance to personal emo-tional preferences. To represent personal emotion characterization, a new psychologymodel is set up to express individual discrete and continuous emotional values, which isArousal, Valence, Preference in the paper. The psychology reaction values are markedusing Likert scale. In order to improve the performance of the individual personal-ized emotional space. Finite Mixture of student’s factor analyzer with the Kuiiback-Leibler Fuzzy c-means (MSFA-KLFCM) is applied to divide emotion subspace. Thet-distribution mixture model(TMM) is used to estimate the degree of membership of theemotion subspace. The experiment results show that the individual personalized musicvideo emotional sub-division of space can be expressed efectively.
     Secondly,The audio-visual features of music video are extracted. The emotionrecognition of music video is based on its unique visual and auditory features. Music canexpress almost all kinds of human emotion. From music knowldege theory and musicpsychology, a group of emotional audio-visual features are selected. Chord as advancedmusic theory is used to express the emotion of music. The chord histogram is introducedas features. And a new chord recognition method is put forward based on the the res-onance frequency of the image (RTFI) to analyze spectral characteristics of frequency. A new salient pitch profile feature is brought forward. The Expectation-maximizationalgorithm is used to recognition the pick of the chord template. The beat characteristicsis used as the post-processing to improve the recognition accuracy. Experiments resultsshow that the algorithm has high recognition accuracy and strong robustness.
     Thirdly, the localized multiple kernel learning regression algorithm is introduced tothe personalized emotion recognition.The music video audio data has temporal dynamic-s. In this paper, a characterization and dynamic nature of dynamic texture (Mel cepstrumchroma spectrum) is presented to capture musical appearance and dynamic features. Thewhole music regarded as a linear dynamic system, and the bags of system histogram asdynamic texture is used in the music video emotion recognition system. Diferent visualand auditory features of music video have difernt role to identify personalized emotioncontent of music video. The localized multiple learning regression algorithm is put up toidentify personalized emotion emotional value of music video. The experimental resultsshow that the recognition system combining bags of system histogram and chord his-togram can more efectively identify the personalized emotional content of music video.
     Finally, music video summarization generation algorithm is put up based on theimage visual complexity. In this paper, music video keyframes are extracted to generatethe static summarziation based on visual image complexity. A new shot segmentationdetection algorithm is put forward to divide the music video sequence into shots. Theimage visual complexity as a similar mechanism is used to extract the keyframe can-didates. There are some information redundancy of these keyframes. The hierarchicalfuzzy C mean clustering is used to cluster and these keyframes are extracted from theclusters and generate video summarization.The objective evaluation criterias are used toevaluate video summarization produced. The experimental results show that the videosummarization produced by proposed methods has good compression rate, fidelity, andshot rescontruct degree.
     The research work of the paper is expanded according to the emotional cognitiveneeds of users.The mapping realtion between the visual-auditory characteristics and theemotional values of user are studied, which can be helpful for users to get their interestedvideos and meet their emotion state from huge audio-visual video database.At the sametime, the results of music video afective cognitive research can provide some new ideasfor the applications of digital media emotional cognitive reseach.

引文

[1] ChungT, RustR, WedelM. My Mobile Music: An Adaptive Personalization System forDigital Audio Players[J]. Marketing Science,2009,28(1):52–68.
    [2] XieL, YanR. Extracting Semantics from Multimedia Content: Challenges and Solutions[J].2009:1–31.10.1007/978-0-387-76569-3-2.
    [3]蒋蒋树强.视觉媒体语义自动提取关键技术研究[D].北京:中国科学院研究生院,2005.
    [4]刘涛.音乐情感认知模型与交互技术研究[D].杭州:浙江大学,2006.
    [5] OliveiraA. AFFECTIVE COMPUTING[J].2007.
    [6] TaoJ, TanT. Afective Computing: A Review[M]. German: Springer,2005:981–995.10.1007/11573548-125.
    [7] PicardR. Afective computing: challenges[J]. International Journal of Human-ComputerStudies,2003,59(1):55–64.
    [8] YinggangX, ZhiliangW, NingC, et al. Facial and eye detection and application in afectiverecognition[C]//2006. Harbin,China.: Inst.of Elec. and Elec. Eng. Computer Society,2006Chinese Control Conference Proceedings, CCC2006.
    [9] HuiZ, ZhiliangW, JihuiM. Facial complex expression recognition based on fuzzy kernelclustering and support vector machines[C]//2007. Haikou, Hainan: Inst. of Elec. and Elec.Eng. Computer Society, Proceedings-Third International Conference on Natural Computa-tion, ICNC2007, vol.1.
    [10] FengjunC, ZhiliangW, ZhengguangX, et al. Facial expression recognition using wavelettransform and neural network ensemble[C]//2008. Shanghai,China: Inst. of Elec. and Elec.Eng. Computer Society, Proceedings20082nd International Symposium on Intelligent Infor-mation Technology Application, IITA2008, vol.2.
    [11] WangS, WangX. Emotion Semantics Image Retrieval: An Brief Overview[M]. Vol.3784LNCS.[S.l.]: Springer,2005:490–497.10.1007/11573548-63.
    [12] WangWn, YuYl, JiangSm. Systems, Man and Cybernetics,2006. SMC’06. IEEE Inter-national Conference on. Taipei, Taiwan: IEEE Systems, Man, and Cybernetics Society,2006:3534–3539.
    [13] WangS, ChenE, WangZ, et al. Research of emotion semantic image annotation and retrievalalgorithm using support vector machine[J]. Moshi Shibie yu Rengong Zhineng/Pattern Recog-nition and Artificial Intelligence,2004,17(1):27–27.
    [14]赵力;黄程韦;邹采荣;余华;王开.一种基于心电信号与语音信号的双模态情感识别方法[J].东南大学学报（自然科学版）,2010,40(5):895–900.
    [15]朱朝晖,潘志庚,唐冰, et al. E-TEATRIX中虚拟人物及情绪系统构造[C]//2003.中国，北京:中国中文信息学会, Proceedings of the1(st) Chinese Confdrence on Afective Computingand Intelligent Interaction.
    [16] HuqA, BelloJ P, SarroffA, et al. Sourcetone: An Automated Music Emotion RecognitionSystem[C]//.[S.l.]:[s.n.].
    [17] SchmidtE, KimY. Projection of acoustic features to continuous valence-arousal mood labelsvia regression[J].2009.
    [18] SchmidtE, ProckupM, ScottJ, et al. Relating Perceptual and Feature Space Invariances inMusic Emotion Recognition[J]. CMMR, London, UK,2012:534–542.
    [19] LaurierC, LartillotO, EerolaT. Exploring relationships between audio features and emo-tion in music[J].2009.
    [20] ZhouL, LinH, GurrinC. EMIR: A novel emotion-based music retrieval system[C]//Klagen-furt,Austria: In The18th International Conference on Multimedia Modeling, Jan,2012:–.
    [21] DannenbergR. SMERS: MUSIC EMOTION RECOGNITION[J]. Information Retrieval,2009(Ismir):651–656.
    [22] LaurierC, HerreraP. Mood cloud: A real-time music mood visualization tool[C]//2008.Copenhagen, Denmark:[s.n.], Proceedings of the2008Computers in Music Modeling andRetrieval Conference.
    [23] YangY, LinY, SuY, et al. A regression approach to music emotion recognition[J]. IEEETransactions on Audio Speech and Language Processing,2008,16(2):448.
    [24] WangJ, YangY, WangH, et al. Personalized music emotion recognition via model adapta-tion[C]//December3-6,2012. Hollywood, California:[s.n.], Proc. APSIPA ASC.
    [25] YangY, ChenH. Machine recognition of music emotion: A review[J]. ACM Transactions onIntelligent Systems and Technology (TIST),2012,3(3):40(1)–40(30).
    [26] WangJ, YangY, WangH, et al. The acoustic emotion Gaussians model for emotion-basedmusic annotation and retrieval[C]//2012. Nara, Japan: ACM, Proceedings of the20th ACMinternational conference on Multimedia.
    [27] ZhangK, SunS. Web Music Emotion Recognition Based on Higher Efective Gene ExpressionProgramming[J]. Neurocomputing,2012:In Press.
    [28] YouM, LiuJ, LiG, et al. Embedded Feature Selection for Multi-label Classification of MusicEmotions[J]. International Journal of Computational Intelligence Systems,2012,5(4):668–678.
    [29] ZhuX, ShiY Y, KimH G, et al. An integrated music recommendation system[J]. IEEETransactions on Consumer Electronics,2006,52(3):917–925.
    [30]彭琼.音乐情感的计算机分析与自动识别技术研究[D].上海:上海交通大学,2008.
    [31]陈宁,曹政,王吉军.基于多分类SVM的MIDI音乐情感分类研究[J].大连大学学报,2009(03):64–67.
    [32] ZhangS, HuangQ, TianQ, et al. Personalized MTV Afective Analysis Using User Profile[J].Advances in Multimedia Information Processing-PCM2008,2008:327–337.
    [33] ZhangS, HuangQ, TianQ, et al. i.MTV: an integrated system for mtv afective analysis[J].Proceeding of the16th ACM international conference on Multimedia,2008.1459541985-986.
    [34] ZhangS, TianQ, JiangS, et al. Afective MTV analysis based on Arousal and Valence fea-tures[C]//2008. Hannover,Germany: IEEE, Multimedia and Expo,2008IEEE InternationalConference on.
    [35] ZhangS, HuangQ, JiangS, et al. Afective visualization and retrieval for music video[J].Multimedia, IEEE Transactions on,2010,12(6):510–522.
    [36].[S.l.]:[s.n.]. http://en.wikipedia.org/wiki/Emotion.
    [37] JuslinP N, SlobodaJ A. Music and emotion:theory and research[M]. Oxford; New York:Oxford University Press,2001.
    [38] WundtW, JuddC. Outlines of psychology[M].[S.l.]: W. Engelmann,1907.
    [39] WoodworthR. Experimental psychology[J].1938.
    [40] ScholsbergH. A scale for the judgment of facial expressions[J]. Journal of ExperimentalPsychology,1941,29(6):497–510.
    [41] HunterP, SchellenbergE. Music and emotion[M]. Berlin,German: Springer,2010:129–164.
    [42] ThayerJ. Multiple indicators of afective response to music[J]. Unpublished doctoral disser-tation, New York University,1986.
    [43] WalterC. Artificial Emotions[J].2006.
    [44] HevnerK. Experimental studies of the elements of expression in music[J]. The AmericanJournal of Psychology,1936,48(2):246–268.
    [45] HevnerK. Expression in music: a discussion of experimental studies and theories[J]. Psycho-logical review,1935,42(2):186.
    [46] ArifinS, CheungP Y K. User attention based arousal content modeling[C]//2006IEEEInternational Conference on Image Processing, ICIP2006,.2006. Atlanta, GA USA,: IEEEComputer Society, Proceedings-International Conference on Image Processing, ICIP.
    [47]金学成.基于语音信号的情感识别研究[D].合肥:中国科学技术大学,2007.
    [48]尤鸣宇.语音情感识别的关键技术研究[D].杭州:浙江大学,2007.
    [49] ZentnerM, GrandjeanD, SchererK R. Emotions evoked by the sound of music: characteri-zation, classification, and measurement.[J]. Emotion (Washington, D.C.),2008,8(4):494–521.
    [50] SchmidtS, StockW. Collective Indexing of Emotions in Images. A Study in Emotional Infor-mation Retrieval[J]. Journal of the American Society for Information Science and Technology,2009,60(5):863–876.
    [51].[S.l.]:[s.n.]. http://www.music-ir.org/mirex/wiki/2008:Audio˙Music˙Mood˙Classification.
    [52] ThayerR. The biopsychology of mood and arousal[M]. New York: Oxford University Press,USA,1990.
    [53] YangY, SuY, LinY, et al. Music emotion recognition: the role of individuality[C]//2007.Ausburg, Bavaria, Germany.: ACM, International Multimedia Conference: Proceedings of theinternational workshop on Human-centered multimedia, vol.28.
    [54] LaurierC, SordoM, SerraJ, et al. Music mood representations from social tags[C]//2009.Kobe, Japan: International Society for Music Information Retrieval, Proceedings of the10thInternational Society for Music Information Conference,.
    [55] SchubertE. Measurement and time series analysis of emotion in music[D]. SYD-NEY，AUSTRALIA: University of New South Wales,1999.
    [56] RussellJ. A circumplex model of afect[J]. Journal of personality and social psychology,1980,39(6):1161.
    [57] RussellJ, WeissA, MendelsohnG. Afect grid: A single-item scale of pleasure and arousal[J].Journal of personality and social psychology,1989,57(3):493–502.
    [58] GrimmM, KroschelK. Evaluation of natural emotions using self assessment manikins[C]//2005. Cancún, Mexico: IEEE, Automatic Speech Recognition and Understanding,2005IEEEWorkshop on.
    [59] ThomaM, RyfS, MohiyeddiniC, et al. Emotion regulation through listening to music ineveryday situations[J]. Cognition&Emotion,2012,26(3):550–560.
    [60] HunterP G, GlennSchellenbergE, StalinskiS M. Liking and identifying e-motionally expressive music: age and gender diferences.[J]. J Exp Child Psy-chol,2011,110(1):80–93. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21530980&retmode=ref&cmd=prlinks.
    [61] BradleyM M, LangP J. Measuring emotion: the Self-Assessment Manikinand the Semantic Diferential.[J]. J Behav Ther Exp Psychiatry,1994,25(1):49–59. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=7962581&retmode=ref&cmd=prlinks.
    [62]杨彦从.分形理论在视频监控图像编码与处理中的应用研究[D].北京:中国矿业大学,2010.
    [63] ChatzisS. A method for training finite mixture models under a fuzzy clustering principle[J].Fuzzy Sets and Systems,2010,161(23):3000–3013.
    [64] HondaK, IchihashiH. Regularized linear fuzzy clustering and probabilistic PCA mixturemodels[J]. Fuzzy Systems, IEEE Transactions on,2005,13(4):508–516.
    [65] Q AshtonActonP. Extension of the mixture of factor analyzers model to incorporate themultivariate t-distribution[J]. Computational Statistics&Data Analysis,2007,51:5327–5338.
    [66] ChatzisS, VarvarigouT. Factor Analysis Latent Subspace Modeling and RobustFuzzy Clustering Using t-Distributions[J]. Ieee T Fuzzy Syst,2009,17(3):505–517.http://links.isiglobalnet2.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=mekentosj&SrcApp=Papers&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000266677000002.
    [67] TrostW, EthoferT, ZentnerM, et al. Mapping Aesthetic Musical Emotions in the Brain[J].Cerebral Cortex,2011:–. http://cercor.oxfordjournals.org/content/early/2011/12/15/cercor.bhr353.abstract.
    [68]孙凯.面向观众的电影情感内容表示与识别方法研究[D].武汉:华中科技大学,2009.
    [69] KreutzG, OttU, TeichmannD, et al. Using music to induce emotions: Influences of musicalpreference and absorption[J]. Psychology of Music,2007,36(1):101–126. http://pom.sagepub.com/cgi/doi/10.1177/0305735607082623.
    [70] ValdezP, MehrabianA. Efects of color on emotions.[J]. Journal of Experimental Psychology:General,1994,123(4):394–.
    [71] CuiY, JinJ S, ZhangS, et al. Music video afective understanding using feature importanceanalysis[C]//2010. Xi’an，China:2011Elsevier Inc, ACM International Conference onImage and Video Retrieval, ACM-CIVR2010.
    [72] HeeLinW, Loong-FahC. Afective understanding in film[J]. Circuits and Systems for VideoTechnology, IEEE Transactions on,2006,16(6):689–704.
    [73] AUTH AIIA E B..[S.l.]:[s.n.]. http://www.ifs.tuwien.ac.at/mir/muscle/del/audio˙tools.html.
    [74] LartillotO, ToiviainenP, EerolaT. A matlab toolbox for music information retrieval[J].Data analysis, machine learning and applications,2008:261–268.
    [75]柳革命,吴姚振.响度特征量化的改进算法[J].空军工程大学学报:自然科学版,2011,12(4):91–94.
    [76]陈小平,胡泽.听觉临界频带及其在声频信号处理中的应用[J].北京广播学院学报:自然科学版,2004,11(2):28–35.
    [77] PampalkE. A Matlab toolbox to compute music similarity from audio[C]//2004. Barcelona:ISMIR, Proc. ISMIR，2004, vol.4.
    [78] BartschM, WakefieldG. To catch a chorus: Using chroma-based representations for audiothumbnailing[J]. Ann Arbor,1001:48109–2110.
    [79]陈刚.基于内容的相关反馈式音乐检索方法研究[D].武汉:华中科技大学,2010.
    [80] FujishimaT. Realtime chord recognition of musical sound: A system using common lispmusic[J]. Proc ICMC,1999:–. http://hdl.handle.net/2027/spo.bbp2372.1999.446.
    [81] McDermottJ, OxenhamA. Music perception, pitch, and the auditory system[J]. Currentopinion in neurobiology,2008,18(4):452–463.
    [82] PALLESEN K J, BratticoE, BaileyC, et al. Emotion processing of major, minor, and disso-nant chords[J]. Annals of the New York Academy of Sciences,2005,1060(1):450–453.
    [83] MadgazinV. The Information Theory of Emotions of Musical Chords[J]. Vadim R. Madgazin,2009, abs/0909.3976:–.
    [84] ZhouR, ReissJ D, MattavelliM, et al. A Computationally Efcient Method for PolyphonicPitch Estimation[J]. EURASIP Journal on Advances in Signal Processing,2009:–. http://portal.acm.org/citation.cfm?id=1671335.
    [85] ZhouR, MattavelliM. A new time-frequency representation for music signal analysis: Res-onator time-frequency image[J]. Signal Processing and Its Applications,2007. ISSPA2007.9th International Symposium on,2007:1–4. http://ieeexplore.ieee.org/xpls/abs˙all.jsp?arnumber=4555594.
    [86] ZhouR. Music onset detection combining energy-based and pitch-based approaches[C]//2007.Vienna University of Technology: MIREX, Proc MIREX Audio Onset Detection Contest.
    [87] MauchM, NolandK. Using musical structure to enhance automatic chord transcrip-tion[C]//Proc ISMIR. Kobe, Japan: University of Illinois at Urbana-Champaign,2009:231–236.
    [88] FengW, ZhangX. Research of Chord Recognition based on MPCP[C]//2010. Singapore:IEEE, Computer and Automation Engineering (ICCAE),2010The2nd International Confer-ence on, vol.4.
    [89] WeilJ, SikoraT, DurrieuJ, et al. Automatic generation of lead sheets from polyphonic musicsignals[C]//2009. Kobe, Japan: ISMIR, Proceedings of the International Society for MusicInformation Retrieval Conference (ISMIR).
    [90] OudreL, Grenier，Y. Applications of Signal Processing to Audio and Acoustics,2009.WASPAA’09. IEEE Workshop on. New Paltz, NY, USA: IEEE,2009:9–12.
    [91] OudreL, GrenierY, FevotteC. Chord recognition by fitting rescaled chroma vectors tochord templates[J]. IEEE Transactions on Audio, Speech and Language Processing,2011,19(7)(7):2222–2233.
    [92] LeeK. Automatic chord recognition from audio using enhanced pitch class profile[C]//2006.New Orleans, USA: The International Computer Music Association, Proc.of the InternationalComputer Music Conference(ICMC),.
    [93] RuohuaZhouM M, JoshuaD. Reiss, ZoiaG. A Computationally Efcient Method for Poly-phonic Pitch Estimation[J]. EURASIP Journal on Advances in Signal Processing, vol.2009,2009:11.
    [94] ZhouR, ReissJ D. Music Onset Detection Combining Energy-based and Pitch-Based Ap-proaches[C]//26September2007. Vienna, Austria:[s.n.], First Place Award, Third MusicInformation Retrieval Evaluation eXchange (MIREX),.
    [95] BenetosE, DixonS. Multiple-F0estimation of piano sounds exploiting spectral structureand temporal evolution[C]//2010. Makuhari, Japan: Statistical And Perceptual Audition2010, International Speech Communication Association Tutorial and Research Workshop onStatistical and Perceptual Audition (SAPA2010).
    [96] StowellD. Adaptive whitening for improved real-time audio onset detection[C]//2007.Copenhagen, Denmark: The International Computer Music Association, Proceedings of theInternational Computer Music Conference2007, vol.52.
    [97] KlapuriA. Multipitch analysis of polyphonic music and speech signals using an auditorymodel[J]. Ieee T Audio Speech,2008,16(2):255–266. http://ieeexplore.ieee.org/xpls/abs˙all.jsp?arnumber=4358092.
    [98] SchroederM. Period Histogram and Product Spectrum: New Methods for FundamentalFrequency Measurement[J]. J. Acoust. Soc. Am.,1968:–. http://link.aip.org/link/?JASMAN/43/829/1.
    [99] HarteC. Automatic chord identification using a quantised chromagram[C]//Audio Engi-neering Society Convention118.2005. New York: Audio Engineering Society, Proceed-ings of the Audio Engineering Society, http://www.aes.org/e-lib/browse.cfm?conv=118&papernum=6412.
    [100] Oudre. Probabilistic framework for template-based chord recognition[J]. Multimedia SignalProcessing (MMSP),2010IEEE International Workshop on,2010:183–187.
    [101] OudreL, GrenierY, FevotteC. Chord Recognition by Fitting Rescaled Chroma Vectors toChord Templates[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2011,19(7):2222–2233.
    [102] DetenberB, SimonsR, ReissJ. The emotional significance of color in television presentation-s[J]. Media Psychology,2000,2(4):331–355.
    [103] KuoF, ChiangM, ShanM, et al. Emotion-based music recommendation by association dis-covery from film music[C]//2005. Singapore: ACM, Proceedings of the13th annual ACMinternational conference on Multimedia.
    [104] LinX, WenX, LuZ, et al. Video afective content recognition based on film grammars andfuzzy evaluation[C]//2008. Chiang Mai, Thailand: IEEE Computer Society, Proceedings2008International Conference on MultiMedia and Information Technology, MMIT2008.
    [105] ShiliangZ, QiT, ShuqiangJ, et al. Afective MTV analysis based on arousal and valencefeatures[C]//2008. Hannover,Germany: IEEE, Multimedia and Expo,2008IEEE InternationalConference on.
    [106] YazdaniA, KappelerK, EbrahimiT. Afective content analysis of music video clips[C]//2011. Scottsdale, Arizona, USA: ACM, MM’11-Proceedings of the2011ACM MultimediaConference and CoLocated Workshops.
    [107]金岭.音乐的时频分析与自动标注[D].长春:吉林大学,2008:–.
    [108]谢秀琴,刘若伦.音乐信号的时频分析[J].声学技术,2008,4(04):543–546.
    [109] ZhuB, ZhangK. Music emotion recognition system based on improved GA-BP[C]//2010International Conference on Computer Design and Applications, ICCDA2010, June25,2010-June27,2010.[S.l.]: IEEE Computer Society.2010International Conference on ComputerDesign and Applications, ICCDA2010, vol.2.
    [110] MostafaM, BillorN. Recognition of Western style musical genres using machine learningtechniques[J]. Expert Systems with Applications,2009,36(8):11378–11389.
    [111] PandaR, PaivaR.15th International Conference on Digital Audio Efects (DAFx-12). Y-ork,UK:[s.n.], September17-21,2012:1–7.
    [112] PandaR, PaivaR. The13th International Conference on Music Information Retrieval, ISMIR2012, nos.1–2. Porto, Portugal:[s.n.],8-12October,2012:23–24.
    [113] TurnbullD, BarringtonL, TorresD, et al. Semantic Annotation and Retrieval of Musicand Sound Efects[J]. Audio, Speech, and Language Processing, IEEE Transactions on,2008,16(2):467–476.
    [114] WangJ, YangY, ChangK, et al. Exploring the relationship between categorical and dimen-sional emotion semantics of music[C]//2012. Nara,Japan: ACM, Proceedings of the secondinternational ACM workshop on Music information retrieval with user-centered and multi-modal strategies.
    [115] ShiraziJ, GhaemmaghamiS. Improvement to speech-music discrimination using sinusoidalmodel based features[J]. Multimedia Tools and Applications,2012,50:415–435.
    [116] XuC, DuP, FengZ, et al. Multi-Modal Emotion Recognition Fusing Video and Audio[J].Appl. Math,2013,7(2):455–462.
    [117] SchmidtE, TurnbullD, KimY. Feature selection for content-based time-varying musicalemotion regression[C]//2010. Philadelphia, Pennsylvania, USA: ACM,11th ACM SIGMMInternational Conference on Multimedia Information Retrieval.
    [118] HuangC, ChuC. Emotion-Based Rhythmic Complexity Analysis for Automated Music Gener-ation[J]. Communicability, Computer Graphics and Innovative Design for Interactive Systems,2012:67–78.
    [119] G?nenM, AlpaydinE. Localized multiple kernel learning[C]//2008. Helsinki, Finland: ACM,Proceedings of the25th international conference on Machine learning.
    [120] GonenM, AlpaydinE. Localized multiple kernel regression[C]//Pattern Recognition (ICPR),201020th International Conference on. Istanbul, Turkey: IEEE,2010:1425–1428.
    [121] G?nenM, AlpaydnE. Localized multiple kernel machines for image recognition[C]//2009.Hilton: Sutclife B: MIT Press,Cambridge, NIPS2009Workshop on Understanding MultipleKernel Learning Methods.
    [122] ChetverikovD, PéteriR. A brief survey of dynamic texture description and recognition[J].Computer Recognition Systems,2005:17–26.
    [123] ChanA, VasconcelosN. Modeling, Clustering, and Segmenting Video with Mixtures of Dy-namic Textures[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2008,30(5):909–926.
    [124] ChanA, VasconcelosN. Classifying video with kernel dynamic textures[C]//Computer Visionand Pattern Recognition,2007. CVPR’07. IEEE Conference on.[S.l.]: IEEE,2007:1–6.
    [125] CovielloE, ChanA, LanckrietG. Time series models for semantic music annotation[J].Audio, Speech, and Language Processing, IEEE Transactions on,2011,19(5):1343–1359.
    [126] BarringtonL, ChanA, LanckrietG. Modeling Music as a Dynamic Texture[J]. Audio,Speech, and Language Processing, IEEE Transactions on,2010,18(3):602–612.
    [127] HeX, CaiD, NiyogiP. Tensor subspace analysis[J]. Advances in Neural Information Process-ing Systems,2006,18:499–.
    [128] MartinR. A metric for ARMA processes[J]. Signal Processing, IEEE Transactions on,2000,48(4):1164–1170.
    [129] DeCockK, DeMoorB. Subspace angles between linear stochastic models[C]//2000. Sydney,NSW: IEEE, Decision and Control,2000. Proceedings of the39th IEEE Conference on, vol.2.
    [130] RavichandranA, ChaudhryR, VidalR. View-invariant dynamic texture recognition usinga bag of dynamical systems[J]. Proc IEEE Comput Soc Conf Comput Vis Pattern Recog-nit,2009:1651–1657. http://ieeexplore.ieee.org/xpls/abs˙all.jsp?arnumber=5206847.
    [131] RavichandranA, ChaudhryR, VidalR. Categorizing Dynamic Textures using a Bag ofDynamical Systems[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on,2012(99):1–. http://ieeexplore.ieee.org/xpls/abs˙all.jsp?arnumber=6178260.
    [132]byLiFei-Fei. Part1: Bag-of-words models[R]. Minneapolis, Minnesota, USA: Computer Sci-ence Dept.Princeton University, CVPR2007. http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html.
    [133] OlshausenB F D. Emergence of simple-cell receptive field properties by learning a sparsecode for natural images[J].1996,381:PP.607一09，.
    [134] BalasubramanianM, SchwartzE. The isomap algorithm and topological stability[J]. Science,2002,295(5552):7–7.
    [135] RoweisS, SaulL. Nonlinear dimensionality reduction by locally linear embedding[J]. Science,2000,290(5500):2323–2326.
    [136] VapnikV. The nature of statistical learning theory[M].[S.l.]: springer,1999:–.
    [137] GonenM, AlpaydinE. Multiple Kernel Learning Algorithms[J]. J. Mach. Learn. Res.,2011,12:2211–2268. http://www.ams.org/mathscinet-getitem?mr=MR2825425.
    [138] ScholkopfB. Statistical learning and kernel methods[J].2000:–.
    [139] SenA, SrivastavaM. Regression analysis: theory, methods, and applications[M].[S.l.]:Springer,1990:–.
    [140]李玉峰.基于内容视频检索的镜头检测及场景检测研究[D].天津:天津大学,2009:–.
    [141]卢悦.基于内容的视频镜头检测与分类研究[D].济南:山东师范大学,2010:–.
    [142] NagasnlmA, TanakaY. Automatic video indexing and full-motion search for object ap-pearance[C]//Proceedings of Second Working Conference on Visual Databases Sys terns. Bu-dapest, Hungary: IFIP Working Group2.6on Databases,1991:113–127.
    [143] ZabihR, MillerJ, MaiK. A feature-based algorithm for detecting and classifying productionefects[J]. Multimedia systems,1999,7(2):119–128.
    [144] HampapurA, WeymouthT, JainR. Digital video segmentation[C]//Proceedings of the secondACM international conference on Multimedia. Philadelphia, PA, USA: ACM,1994:357–364.
    [145]金红,周源华.一种基于模型的扫换检测方法[J].软件学报,2001,03(03):468–474.
    [146] FengH, FangW, LiuS, et al. A new general framework for shot boundary detection basedon SVM[C]//2005. Beijing, China: IEEE, Neural Networks and Brain,2005. ICNN&B’05.International Conference on, vol.2.
    [147] QiY, HauptmannA, LiuT. Supervised classification for video shot segmenta-tion[C]//Multimedia and Expo,2003. ICME’03. Proceedings.2003International Conferenceon. Baltimore, MD, USA,: IEEE,2003，2:II–689.
    [148] ZhaoZ, CaiA. Shot boundary detection algorithm in compressed domain based on adaboostand fuzzy theory[J]. Advances in Natural Computation,2006:617–626.
    [149] BellA, SejnowskiT. The independent components of natural scenes are edge filters[J]. Visionresearch,1997,37(23):3327一3338.
    [150] NorthA, HargreavesD. Liking, arousal potential, and the emotions expressed by music[J].Scandinavian Journal of Psychology,1997,38(1):45–53.
    [151] DrewM, J W. Illumination-invariant color object recognition via compressed chromaticityhistograms of color-channel-normalized images[J]. Computer Vision,2002.
    [152] G H. Computing illumination-invariant descriptors of spatially filtered color image regions[J].Image Processing,2002.
    [153]邢强,袁保宗,唐晓芳.一种基于加权色彩直方图的快速图像检索方法[J].计算机研究与发展,2005,11(11):1903–1910.
    [154] YusoffY, ChristmasW, KittlerJ. Video shot cut detection using adaptive threshold-ing[C]//British Machine Vision Conference.2000. Bristol,UK:[s.n.], British Machine VisionConference.
    [155] DugadR, RatakondaK, AhujaN. Robust video shot change detection[C]//Multimedia SignalProcessing,1998IEEE Second Workshop on. Redondo Beach, CA, USA: IEEE,1998:376–381.
    [156] LeeF, KotaniK, ChenQ, et al. Proceeding Signal and Image Processing.[S.l.]: ACTA Press,2006:75–78.
    [157] FaurD, GavatI, DatcuM. Mutual information based measure for image content characteriza-tion[J]. Current Topics in Artificial Intelligence,2006:342–349.
    [158] RigauJ, FeixasM, SbertM. An information-theoretic framework for image complexity[C]//2005. Girona, Spain: ACM, Computational Aesthetics2005: Eurographics Workshop onComputational Aesthetics in Graphics, Visualization and Imaging.
    [159] SchaR, BodR. Computationele esthetica[J]. Informatie en Informatiebeleid,1993,11(1):54–63.
    [160] LloydS. Measures of complexity: a nonexhaustive list[J]. IEEE Control Systems Magazine,2001,21(4):7–8.
    [161]张建明,林亚平,吴宏斌, et al.独立成分分析的研究进展[J].系统仿真学报,2006,04(04):992–997+1001.
    [162] PerkiJ, Hyv rinenA. Modelling image complexity by independent component analysis, withapplication to content-based image retrieval[J]. Artificial Neural Networks–ICANN2009,2009:704–714.
    [163] Hyv?rinenA, HurriJ, O. HoyerP. Natural Image Statistics: A Probabilistic Approach toEarly Computational Vision[M]. Berlin,German: Springer, June4,2009:448.
    [164] BezdekJ. Numerical taxonomy with fuzzy sets[J]. Journal of Mathematical Biology,1974,1(1):57–71.
    [165] BezdekJ. Pattern recognition with fuzzy objective function algorithms[M]. Netherlands,USA:Kluwer Academic Publishers,1981.
    [166]马飞.数据挖掘中的聚类算法研究[D].南京:南京理工大学,2008:–.
    [167]李文超,周勇,夏士雄.一种新的基于层次和K-means方法的聚类算法[C]//2007.江苏,徐州:北京航空航天大学出版社,第二十六届中国控制会议，中国自动化学会控制理论专业委员会(Technical Committee on Control Theory,Chinese Association of Automation).
    [168]金微,陈慧萍.基于分层聚类的k-means算法[J].河海大学常州分校学报,2007,01(01):7–10.
    [169]雷志明.基于分层聚类的FCM算法[J].科技信息,2008,35(35):658–659.
    [170] AranganayagiS, ThangavelK. Clustering categorical data using silhouette coefcient as a re-locating measure[C]//2007. Sivakasi, Tamilnadu, India: IEEE, Conference on ComputationalIntelligence and Multimedia Applications,2007., vol.2.
    [171] LamrousS, TailebM. Divisive Hierarchical K-Means[C]//2006. Sydney, Australia: IEEE,International Conference on Computational Intelligence for Modelling Control and Automa-tion,and International Conference on Intelligent Agents,Web Technologies and Internet Com-merce (CIMCA-IAWTIC’06).
    [172] NarasimhaR, SavakisA, RaoR, et al. A neural network approach to key frame extrac-tion[C]//[S.l.]: The International Society for Optical Engineering (SPIE),2003:–.
    [173] LagendijkR, HanjalicA, CeccarelliM, et al. Visual search in a SMASH system[C]//ImageProcessing,1996. Proceedings., International Conference on. Lausanne, Switzerland: IEEE,1996，3:671–674.
    [174] LiuT, ZhangH, QiF. A novel video key-frame-extraction algorithm based on perceivedmotion energy model[J]. Circuits and Systems for Video Technology, IEEE Transactions on,2003,13(10):1006–1013.
    [175] PengJ, XiaolinQ. Keyframe-Based Video Summary Using Visual Attention Clues[J]. Multi-media, IEEE,2010,17(2):64–73.
    [176]孙中华.基于内容的视频结构分析与摘要方法研究[D].长春:吉林大学,2007:–.
    [177] DoulamisN, DoulamisA, AvrithisY, et al. Video content representation using optimal extrac-tion of frames and scenes[C]//1998. Chicago, Illinois, USA: IEEE, Image Processing,1998.ICIP98., vol.1.
    [178] LiuT, ZhangX, FengJ, et al. Shot reconstruction degree: a novel criterion for key frameselection[J]. Pattern recognition letters,2004,25(12):1451–1457.
    [179] FilipJ, HaindlM. Fast and reliable PCA-based temporal segmentation of video sequences[C]//2008. Tampa,Florida,USA: IEEE, Pattern Recognition,2008. ICPR2008.19th InternationalConference on.
    [180] AmiriA, FathyM. Hierarchical Keyframe-based Video Summarization Using QR-Decomposition and Modified k-Means Clustering[J]. EURASIP Journal on Advances in SignalProcessing,2010:1–16.
    [181] LeeS, HayesM. Properties of the singular value decomposition for efcient data clustering[J].Signal Processing Letters, IEEE,2004,11(11):862–866.
    [182] PengJ, Xiao-LinQ. Keyframe-based video summary using visual attention clues[J]. IEEEMultiMedia,2009:64–73.
    [183] GianluigiC, RaimondoS. An innovative algorithm for key frame extraction in video summa-rization[J]. Journal of Real-Time Image Processing,2006,1(1):69–88.10.1007/s11554-006-0001-1.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700