Recurrent 3D attentional networks for end-to-end active object recognition

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Recurrent 3D attentional networks for end-to-end active object recognition

详细信息查看全文 | 推荐本文 |

英文篇名：Recurrent 3D attentional networks for end-to-end active object recognition
作者：Min ; Liu ; Yifei ; Shi ; Lintao ; Zheng ; Kai ; Xu ; Hui ; Huang ; Dinesh ; Manocha
英文作者：Min Liu;Yifei Shi;Lintao Zheng;Kai Xu;Hui Huang;Dinesh Manocha;School of Computer,National University of Defense Technology;Department of Computer Science and Electrical & Computer Engineering,University of Maryland;Visual Computing Research Center,Shenzhen University;
英文关键词：active object recognition;;recurrent neural network;;next-best-view;;3D attention
中文刊名：CVME
英文刊名：计算可视媒体(英文)
机构：School of Computer,National University of Defense Technology;Department of Computer Science and Electrical & Computer Engineering,University of Maryland;Visual Computing Research Center,Shenzhen University;
出版日期：2019-03-15
出版单位：Computational Visual Media
年：2019
期：v.5
基金：supported by National Natural Science Foundation of China (Nos. 61572507, 61622212, and 61532003);; supported by the China Scholarship Council
语种：英文;
页：CVME201901008
页数：13
CN：01
ISSN：10-1320/TP
分类号：92-104

摘要

Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.
Active vision is inherently attention-driven:an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed.Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model,trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is dierentiable,allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.

引文

[1]Denzler,J.;Brown,C.M.Information theoretic sensor data selection for active object recognition and state estimation.IEEE Transactions on Pattern Analysis and Machine Intelligence Vol.24,No.2,145-157,2002.
    [2]Huber,M.F.;Dencker,T.;Roschani,M.;Beyerer,J.Bayesian active object recognition via Gaussian process regression.In:Proceedings of the 15th International Conference on Information Fusion,1718-1725,2012.
    [3]Wu,Z.;Song,S.;Khosla,A.;Yu,F.;Zhang,L.;Tang,X.;Xiao,J.3D ShapeNets:A deep representation for volumetric shapes.In:Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,1912-1920,2015.
    [4]Jayaraman,D.;Grauman,K.Look-ahead before you leap:End-to-end active recognition by forecasting the effect of motion.In:Computer Vision-ECCV 2016.Lecture Notes in Computer Science,Vol.9909.Leibe,B.;Matas,J.;Sebe,N.;Welling,M.Eds.Springer Cham,489-505,2016.
    [5]Xu,K.;Shi,Y.;Zheng,L.;Zhang,J.;Liu,M.;Huang,H.;Su,H.;Cohen-Or,D.;Chen,B.3D attentiondriven depth acquisition for object identifiation.ACMTransactions on Graphics Vol.35,No.6,Article No.238,2016.
    [6]Chen,S.;Zheng,L.;Zhang,Y.;Sun,Z.;Xu,K.VERAM:View-enhanced recurrent attention model for 3D shape classification.IEEE Transactions on Visualization and Computer Graphics doi:10.1109/TVCG.2018.2866793,2018.
    [7]Mnih,V.;Heess,N.;Graves,A.;Kavukcuoglu,K.Recurrent models of visual attention.In:Proceedings of the Advances in Neural Information Processing Systems27,2204-2212,2014.
    [8]Xu,K.;Ba,J.L.;Kiros,R.;Courville,A.;Salakhutdinov,R.;Zemel,R.S.;Bengio,Y.Show,attend and tell:Neural image caption generation with visual attention.In:Proceedings of the 32nd International Conference on Machine Learning,Vol.37,2048-2057,2015.
    [9]Corbetta,M.;Shulman,G.L.Control of goal-directed and stimulus-driven attention in the brain.Nature Reviews Neuroscience Vol.3,No.3,201-215,2002.
    [10]Jaderberg,M.;Simonyan,K.;Zisserman,A.;Kavukcuoglu,K.Spatial transformer networks.In:Proceedings of the Advances in Neural Information Processing Systems 28,2017-2025,2015.
    [11]Scott,W.R.;Roth,G.;Rivest,J.-F.View planning for automated three-dimensional object reconstruction and inspection.ACM Computing Surveys Vol.35,No.1,64-96,2003.
    [12]Dutta Roy,S.;Chaudhury,S.;Banerjee,S.Active recognition through next view planning:A survey.Pattern Recognition Vol.37,No.3,429-446,2004.
    [13]Qi,C.R.;Su,H.;Mo,K.;Guibas,L.J.PointNet:Deep learning on point sets for 3D classification and segmentation.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,652-660,2017.
    [14]Qi,C.R.;Yi,L.;Su,H.;Guibas,L.J.PointNet++:Deep hierarchical feature learning on point sets in a metric space.In:Proceedings of the Advances in Neural Information Processing Systems 30,5099-5108,2017.
    [15]Xie,S.;Liu,S.;Chen,Z.;Tu,Z.Attentional ShapeContextNet for point cloud recognition.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,4606-4615,2018.
    [16]Feng,Y.;Zhang,Z.;Zhao,X.;Ji,R.;Gao,Y.GVCNN:Group-view convolutional neural networks for 3D shape recognition.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,264-272,2018.
    [17]Borotschnig,H.;Paletta,L.;Prantl,M.;Pinz,A.Appearance-based active object recognition.Image and Vision Computing Vol.8,No.9,715-727,2000.
    [18]Callari,F.G.;Ferrie,F.P.Active object recognition:Looking for differences.International Journal of Computer Vision Vol.43,No.3,189-204,2001.
    [19]Arbel,T.;Ferrie,F.P.Entropy-based gaze planning.Image and Vision Computing Vol.19,No.11,779-786,2001.
    [20]Paletta,L.;Pinz,A.Active object recognition by view integration and reinforcement learning.Robotics and Autonomous Systems Vol.31,No.1,71-86,2000.
    [21]Kurniawati,H.;Hsu,D.;Lee,W.S.SARSOP:Efficient point-based POMDP planning by approximating optimally reachable belief spaces.In:Proceedings of the Robotics:Science and Systems,Vol.2008,2008.
    [22]Lauri,M.;Atanasov,N.;Pappas,G.;Ritala,R.Active object recognition via Monte Carlo tree search.In:Proceedings of the Workshop on Beyond Geometric Constraints at the International Conference on Robotics and Automation,2015.
    [23]Levine,S.;Finn,C.;Darrell,T.;Abbeel,P.End-toend training of deep visuomotor policies.Journal of Machine Learning Research Vol.17,No.39,1-40,2016.
    [24]Malmir,M.;Sikka,K.;Forster,D.;Movellan,J.;Cottrell,G.W.Deep Q-learning for active recognition of germs:Baseline performance on a standardized dataset for active learning.In:Proceedings of the British Machine Vision Conference,161-171,2016.
    [25]Krizhevsky,A.;Sutskever,I.;Hinton,G.E.ImageNet classification with deep convolutional neural networks.In:Proceedings of the Advances in Neural Information Processing Systems 25,1097-1105,2012.
    [26]Mozer,M.C.A focused back-propagation algorithm for temporal pattern recognition.Complex Systems Vol.3,No.4,349-381,1989.
    [27]Wu,Z.;Song,S.;Khosla,A.;Tang,X.;Xiao,J.3DShapeNets for 2.5D object recognition and next-bestview prediction.arXiv preprint arXiv:1406.5670,2014.
    [28]Chang,A.X.;Funkhouser,T.;Guibas,L.;Hanrahan,P.;Huang,Q.;Li,Z.;Savarese,S.;Savva,M.;Song,S.;Su,H.;Xiao,J.;Yi,L.;Yu,F.ShapeNet:An information-rich 3D model repository.arXiv preprint ar Xiv:1512.03012,2015.
    [29]Johns,E.;Leutenegger,S.;Davison,A.J.Pairwise decomposition of image sequences for active multi-view recognition.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,3813-3822,2016.
    [30]Su,H.;Maji,S.;Kalogerakis,E.;Learned-Miller,E.Multi-view convolutional neural networks for 3D shape recognition.In:Proceedings of the IEEE International Conference on Computer Vision,945-953,2015.
    [31]Bajcsy,R.Active perception.Proceedings of the IEEEVol.76,No.8,966-1005,1988.
    [32]Xiao,T.;Xu,Y.;Yang,K.;Zhang,J.;Peng,Y.;Zhang,Z.The application of two-level attention models in deep convolutional neural network for fine-grained image classification.In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,842-850,2015.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700