用户名: 密码: 验证码:
Action recognition using lie algebrized gaussians over dense local spatio-temporal features
详细信息    查看全文
  • 作者:Meng Chen (1)
    Liyu Gong (2)
    Tianjiang Wang (1)
    Qi Feng (1)

    1. School of Computer Science and Technology
    ; Huazhong University of Science and Technology ; Wuhan ; 430074 ; China
    2. Eedoo Inc
    ; Beijing ; 100085 ; China
  • 关键词:Action recognition ; Dense sampling ; Local spatio ; temporal feature ; Gaussian mixture model ; Lie algebrized gaussians
  • 刊名:Multimedia Tools and Applications
  • 出版年:2015
  • 出版时间:March 2015
  • 年:2015
  • 卷:74
  • 期:6
  • 页码:2127-2142
  • 全文大小:795 KB
  • 参考文献:1. Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: IEEE conference on computer vision and pattern recognition
    2. Chang, C, Lin, C (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2: pp. 1-27 CrossRef
    3. Doll谩r P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance
    4. Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: IEEE international conference on computer vision
    5. Gong L, Chen M, Hu C (2013) Lie algebrized gaussians for image representation. arXiv:1304.0823v1
    6. Kl盲ser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: British machine vision conference
    7. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE conference on computer vision and pattern recognition
    8. Laptev I, Lindeberg T (2003) Space-time interest points. In: IEEE international conference on computer vision
    9. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition
    10. Le Q, Zou W, Yeung S, Ng A (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition
    11. Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: IEEE international conference on computer vision
    12. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos 鈥渋n the wild鈥? In: IEEE conference on computer vision and pattern recognition
    13. Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition
    14. Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: IEEE international conference on computer vision
    15. O鈥橦ara S, Draper B (2012) Scalable action recognition with a subspace forest. In: IEEE conference on computer vision and pattern recognition
    16. Oikonomopoulos, A, Patras, I, Pantic, M (2005) Spatio-temporal salient points for visual recognition of human actions. IEEE Trans Syst Man Cybern, Part B: Cybern 36: pp. 710-719 CrossRef
    17. Reynolds, DA, Quatieri, TF, Dunn, RB (2000) Speaker verification using adapted gaussian mixture models. Digit Signal Process 10: pp. 19-41 CrossRef
    18. Rodriguez MD, Ahmed J, Shah M (2008) Action mach: A spatio-temporal maximum average correlation height filter for action recognition. In: IEEE conference on computer vision and pattern recognition
    19. Sch眉ldt C, Laptev I, Caputo B (2004) Recognizing human actions: A local svm approach. In: International conference on pattern recognition
    20. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: ACM international conference on multimedia
    21. Gemert, JC, Veenman, CJ, Smeulders, AWM, Geusebroek, JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32: pp. 1271-1283 CrossRef
    22. Wang H, Ullah MM, Kl盲ser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference
    23. Wang J, Chen Z, Wu Y (2011) Action recognition with multiscale spatio-temporal contexts. In: IEEE conference on computer vision and pattern recognition
    24. Willems G, Tuytelaars T, Gool LV (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: European conference on computer vision
    25. Wong S, Cipolla R (2007) Extracting spatio-temporal interest points using global information. In: IEEE international conference on computer vision
    26. Wu X, Xu D, Duan L, Luo J (2011) Action recognition using context and appearance distribution features. In: IEEE conference on computer vision and pattern recognition
    27. Yan S, Zhou X, Liu M, Hasegawa-Johnson M, Huang TS (2008) Regression from patch-kernel. In: IEEE conference on computer vision and pattern recognition
    28. Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: IEEE international conference on computer vision
    29. Zhou X, Cui N, Li Z, Liang F, Huang TS (2009) Hierarchical gaussianization for image classification. In: IEEE international conference on computer vision
    30. Zhou X, Zhuang X, Yan S, Chang S, Hasegawa-Johnson M, Huang TS (2008) Sift-bag kernel for video event analysis. In: ACM international conference on multimedia
  • 刊物类别:Computer Science
  • 刊物主题:Multimedia Information Systems
    Computer Communication Networks
    Data Structures, Cryptology and Information Theory
    Special Purpose and Application-Based Systems
  • 出版者:Springer Netherlands
  • ISSN:1573-7721
文摘
This paper presents a novel framework for human action recognition based on a newly proposed mid-level feature representation method named Lie Algebrized Guassians (LAG). As an action sequence can be treated as a 3D object in space-time space, we address the action recognition problem by recognizing 3D objects and characterize 3D objects by the probability distributions of local spatio-temporal features. First, for each video, we densely sample local spatio-temporal features (e.g. HOG3D) at multiple scales confined in bounding boxes of human body. Moreover, normalized spatial coordinates are appended to local descriptor in order to capture spatial position information. Then the distribution of local features in each video is modeled by a Gaussian Mixture Model (GMM). To estimate the parameters of video-specific GMMs, a global GMM is trained using all training data and video-specific GMMs are adapted from the global GMM. Then the LAG is adopted to vectorize those video-specific GMMs. Finally, linear SVM is employed for classification. Experimental results on the KTH and UCF Sports dataset show that our method achieves state-of-the-art performance.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700