基于镜头分割与空域注意力模型的视频广告分类方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于镜头分割与空域注意力模型的视频广告分类方法

详细信息查看全文 | 推荐本文 |

英文篇名：Video Advertisement Classification Method Based on Shot Segmentation and Spatial Attention Model
作者：谭凯 ; 吴庆波 ; 孟凡满 ; 许林峰
英文作者：TAN Kai;WU Qing-bo;MENG Fan-man;XU Lin-feng;School of Information and Communication Engineering,University of Electronic Science and Technology of China;
关键词：分类 ; 视频广告 ; 注意力 ; 标注
英文关键词：Classification;;Video advertisement;;Attention;;Annotation
中文刊名：JSJA
英文刊名：Computer Science
机构：电子科技大学信息与通信工程学院;
出版日期：2019-03-15
出版单位：计算机科学
年：2019
期：v.46
基金：国家自然科学基金(61601102,61502084,61871087)资助
语种：中文;
页：JSJA201903019
页数：6
CN：03
ISSN：50-1075/TP
分类号：137-142

摘要

随着视频广告在检索和用户推荐等领域的广泛应用,视频广告的分类成为一个重要问题。与现有视频分类任务不同,视频广告有其自身的特点:1)在时域上,产品对象在广告视频中的出现具有非周期性和稀疏性的特点,这使得分类任务需要排除大量与视频类别不相关的视频帧的干扰,利用少数相关视频帧进行分类;2)在空域上,视频帧中除产品外,还包含复杂背景的问题,这使得有效捕捉产品信息变得困难。为了解决上述问题,文中提出了一种基于镜头分割和空域注意力模型的视频广告分类方法,简称SSSA。针对视频中存在的大量干扰帧,文中使用基于镜头切换的分割方法采样视频帧。针对视频帧中包含复杂背景,文中在网络中引入视觉注意力机制帮助网络从产品相关区域提取判别性的特征。为了验证所提方法的有效性,构建了一个包含1 000多个视频广告的数据库(简称TAV)并收集了眼动数据来训练注意力模型。实验结果显示,提出的SSSA视频分类方法比现有的视频分类方法在性能上提升了10%。
As video advertisement is increasingly used in some areas such as search and user recommendation,advertisement video classification becomes an important issue and poses a significant challenge for computer vision.Different from the existing video classification task,there are two challenges of advertisement video classification.First,advertised products appear in advertisement video aperiodically and sparsely.This means that most of frames are irrelevant to advertisement category,which can potentially cause interference with classification models.Second,there are complex background in advertisement video which makes it hard to extract useful information of product.To solve these problems,this paper proposed an advertisement video classification method based on shot segmentation and spatial attention model(SSSA).To address interference of irrelevant frames,a shot based partitioning method was used to sample frames.To solve the influence of complex background on feature extraction,the attention mechanism was embedded into SSSA to locate products and extract discriminative feature from the attention area which is mostly related to the advertised products.An attention predictionnetwork(APN) was trained to predict the attention map.To verify the proposed model,this paper introduced a new thousand-level dataset for advertisement video classification named TAV,and the gaze data were also collected to train the APN.Experiments evaluated on the TAV dataset demonstrate that the performance of the proposed model improves about 10% compared with the state-of-the-art video classification methods.

引文

[1] WU Q,LI H,WANG Z,et al.Blind image quality assessment based on rank-order regularized regression[J].IEEE Transactions on Multimedia,2017,19(11):2490-2504.
    [2] MENG F,LI H,WU Q,et al.Seeds-based part segmentation by seeds propagation and region convexity decomposition[J].IEEE Transactions on Multimedia,2018,20(2):310-322.
    [3] WU Q,LI H,NGAN K N,et al.Blind image quality assessment using local consistency aware retriever and uncertainty aware evaluator[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28(9):2078-2089.
    [4] TAN K,XU L,LIU Y,et al.Small group detection in crowds using interaction information[J].IEICE Transactions on Information and Systems,2017,100(7):1542-1545.
    [5] WU Q,LI H,MENG F,et al.A perceptually weighted rank correlation indicator for objective image quality assessment[J].IEEE Transactions on Image Processing,2018,27(5):2499-2513.
    [6] MENG F,CAI J F,LI H.Cosegmentation of multiple image groups[J].Computer Vision and Image Understanding,2016,146:67-76.
    [7] WU Q,LI H,MENG F,et al.Blind image quality assessment based on multichannel feature fusion and label transfer[J].IEEE Transactions on Circuits and Systems for Video Technology,2016,26(3):425-440.
    [8] HU W,HU R,XIE N,et al.Image classification using multiscale information fusion based on saliency driven nonlinear diffusion filtering[J].IEEE Transactions on Image Processing,2014,23(4):1513-1526.
    [9] ISCEN A,TOLIAS G,GOSSELINP H,et al.A comparison of dense region detectors for image search and fine-grained classification[J].IEEE Transactions on Image Processing,2015,24(8):2369-2381.
    [10] XIAO T,XU Y,YANG K,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:842-850.
    [11] SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[C]//Advances in Neural Information Processing Systems.2014:568-576.
    [12] TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:4489-4497.
    [13] DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long- term recurrent convolutional networks for visual recognition and description[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:2625-2634.
    [14] DAVE A,RUSSAKOVSKY O,RAMANAN D.Predictive-cor- rective networks for action detection[C]//Proceedings of the Computer Vision and Pattern Recognition.IEEE,2017.
    [15] JHUANG H,GALL J,ZUFFI S,et al.Towards understanding action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2013:3192-3199.
    [16] CHéRON G,IVAN L,et al.P-CNN:Pose-based CNN features for action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.IEEE,2015:3218-3226.
    [17] KARPATHY A,TODERICI G,SHETTY S,et al.Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2014:1725-1732.
    [18] NG J Y H,HAUSKNECHT M J,VIJAYANARASIMHAN S,et al.Beyond short snippets:Deep networks for video classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:4694-4702.
    [19] MENG F,LI H,WU Q,et al.Weakly supervised part proposal segmentation from multiple images[J].IEEE Trans.Image Processing,2017,26(8):4019-4031.
    [20] MENG F,LI H,WU Q,et al.Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,28(4):906-919.
    [21] MENG F,LI H,LIU G,et al.Object co-segmentation based on shortest path algorithm and saliency model[J].IEEE Transactions on Multimedia,2012,14(5):1429-1441.
    [22] LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2015:3431-3440.
    [23] FEICHTENHOFER C,PINZ A,ZISSERMAN A.Convolutional two-stream network fusion for video action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016:1933-1941.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700