Action recognition using lie algebrized gaussians over dense local spatio-temporal features

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文

作者：Meng Chen (1)
Liyu Gong (2)
Tianjiang Wang (1)
Qi Feng (1)

1. School of Computer Science and Technology ; Huazhong University of Science and Technology ; Wuhan ; 430074 ; China
2. Eedoo Inc ; Beijing ; 100085 ; China
关键词：Action recognition ; Dense sampling ; Local spatio ; temporal feature ; Gaussian mixture model ; Lie algebrized gaussians
刊名：Multimedia Tools and Applications
出版年：2015
出版时间：March 2015
年：2015
卷：74
期：6
页码：2127-2142
全文大小：795 KB
参考文献：1. Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: IEEE conference on computer vision and pattern recognition
2. Chang, C, Lin, C (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2: pp. 1-27 CrossRef
3. Doll谩r P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance
4. Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: IEEE international conference on computer vision
5. Gong L, Chen M, Hu C (2013) Lie algebrized gaussians for image representation. arXiv:1304.0823v1
6. Kl盲ser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: British machine vision conference
7. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition
8. Laptev I, Lindeberg T (2003) Space-time interest points. In: IEEE international conference on computer vision
9. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition
10. Le Q, Zou W, Yeung S, Ng A (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition
11. Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: IEEE international conference on computer vision
12. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos 鈥渋n the wild鈥? In: IEEE conference on computer vision and pattern recognition
13. Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition
14. Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: IEEE international conference on computer vision
15. O鈥橦ara S, Draper B (2012) Scalable action recognition with a subspace forest. In: IEEE conference on computer vision and pattern recognition
16. Oikonomopoulos, A, Patras, I, Pantic, M (2005) Spatio-temporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern, Part B: Cybern 36: pp. 710-719 CrossRef
17. Reynolds, DA, Quatieri, TF, Dunn, RB (2000) Speaker verification using adapted gaussian mixture models. Digit Signal Process 10: pp. 19-41 CrossRef
18. Rodriguez MD, Ahmed J, Shah M (2008) Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition
19. Sch眉ldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. In: International conference on pattern recognition
20. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM international conference on multimedia
21. Gemert, JC, Veenman, CJ, Smeulders, AWM, Geusebroek, JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32: pp. 1271-1283 CrossRef
22. Wang H, Ullah MM, Kl盲ser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference
23. Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: IEEE conference on computer vision and pattern recognition
24. Willems G, Tuytelaars T, Gool LV (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision
25. Wong S, Cipolla R (2007) Extracting spatio-temporal interest points using global information. In: IEEE international conference on computer vision
26. Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE conference on computer vision and pattern recognition
27. Yan S, Zhou X, Liu M, Hasegawa-Johnson M, Huang TS (2008) Regression from patch-kernel. In: IEEE conference on computer vision and pattern recognition
28. Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: IEEE international conference on computer vision
29. Zhou X, Cui N, Li Z, Liang F, Huang TS (2009) Hierarchical gaussianization for image classification. In: IEEE international conference on computer vision
30. Zhou X, Zhuang X, Yan S, Chang S, Hasegawa-Johnson M, Huang TS (2008) Sift-bag kernel for video event analysis. In: ACM international conference on multimedia
刊物类别：Computer Science
刊物主题：Multimedia Information Systems
Computer Communication Networks
Data Structures, Cryptology and Information Theory
Special Purpose and Application-Based Systems
出版者：Springer Netherlands
ISSN：1573-7721

文摘

This paper presents a novel framework for human action recognition based on a newly proposed mid-level feature representation method named Lie Algebrized Guassians (LAG). As an action sequence can be treated as a 3D object in space-time space, we address the action recognition problem by recognizing 3D objects and characterize 3D objects by the probability distributions of local spatio-temporal features. First, for each video, we densely sample local spatio-temporal features (e.g. HOG3D) at multiple scales confined in bounding boxes of human body. Moreover, normalized spatial coordinates are appended to local descriptor in order to capture spatial position information. Then the distribution of local features in each video is modeled by a Gaussian Mixture Model (GMM). To estimate the parameters of video-specific GMMs, a global GMM is trained using all training data and video-specific GMMs are adapted from the global GMM. Then the LAG is adopted to vectorize those video-specific GMMs. Finally, linear SVM is employed for classification. Experimental results on the KTH and UCF Sports dataset show that our method achieves state-of-the-art performance.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700