用户名: 密码: 验证码:
图像序列中人的姿态估计与动作识别
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
动作识别和行为理解是计算机视觉和模式识别领域的热点问题,在高级人机交互、智能视频监控、虚拟现实等领域具有广泛的应用前景。本文主要研究图像序列中人的姿态估计与动作识别,包括姿态估计中高维状态向量的约减、单目图像估计三维姿态的多义性、动作识别中特征的提取与表示、分类器的设计与建模等问题。
     本文研究了基于非线性流形学习的三维人体姿态估计,提出时间邻域保持嵌入(Temporal Neighbor Preserving Embedding, TNPE)的非线性流形学习算法得到反映人体运动本质的低维流形空间。在基于学习的姿态估计框架下,采用贝叶斯混合专家(Bayesian Mixture of Experts, BME)模型对从低维流形空间到高维姿态空间的非线性映射关系进行建模。为计算每个专家的权重,采用高斯混合(Gaussian Mixture Model, GMM)模型对低维流形空间中的数据分布进行概率建模,得到各个专家的先验概率和先验分布形式。实验表明,该方法能够准确地估计人体姿态。
     本文提出了基于语义知识反馈的三维人体姿态估计框架,利用人体运动的高层语义知识自上而下地对人体姿态的估计进行指导,减少了单目图像三维姿态估计的多义性和不确定性。建立全局时间运动模板,表示运动中姿态之间的时间先后约束关系;建立局部空间运动关联函数,表示身体各部分之间的运动相关约束。将运动模板和运动关联函数分别定义为全局语义知识和局部语义知识,并利用这两种运动语义知识对由粗略估计得到的可能姿态进行筛选和更新,得到更加准确的估计结果。实验表明,引入高层语义知识反馈的估计方法能有效地提高估计的准确度。
     本文提出了增量判别典型相关分析方法(Incremental Discriminant-Analysis of Canonical Correlations, IDCC),用于复杂环境中人的动作识别。该方法通过判别投影矩阵将所有动作投影到一个新空间中,以实现同类动作之间的相关系数最大化和不同类动作之间的相关系数最小化。针对运动中人的表观特征由于周围事物的影响而不断变化的问题,该方法通过增量学习不断更新判别矩阵,使得判别模型能随着数据的变化而自我调整,减少了环境变化对识别效果的影响。多个行为数据库上的实验表明,增量判别典型相关分析方法在复杂多变的环境中也能鲁棒地识别不规则的动作。
     本文研究了基于时空兴趣点的动作识别方法,提出了时空兴趣点的多尺度时空分布词袋模型。该模型在视频中不同时空尺度的局部区域内,对兴趣点的时空分布信息进行建模,从多个层次描述了兴趣点之间的时空上下文关系。同时利用时空兴趣点的表观词袋模型对兴趣点的表观信息进行建模。多尺度时空分布特征和表观特征从两个不同的角度分别描述了兴趣点的“在哪里”和“是什么”属性,本文采用多核学习方法将这两种特征有机地融合起来,生成更具描述能力和判别能力的特征。基于时空兴趣点多特征的识别方法不需要目标检测、人体跟踪等预处理工作,在存在噪声阴影、摄像机发生抖动、视频分辨率低等情况下也能取得令人满意的识别结果。单视角和多视角行为数据库上的实验证明了该方法的有效性。
Human action analysis and recognition is a highly active research area in the domain of computer vision and pattern recognition. It has many promising applications including human computer interaction, intelligent surveillance, visual reality and motion analysis. In this thesis, we focus on 3D human pose estimation and action recognition from image sequences. We mainly solve the problems of high dimensionality of pose space and ambiguity in the human pose estimation as well as feature representation and classifier design in the action recognition.
     A novel manifold learning method, called temporal neighbor preserving embedding (TNPE), is proposed to learn the low-dimensional intrinsic manifold of human motion in the learning-based framework for 3D human pose estimation. It alleviates the problem of high-dimensionality in both image feature and 3D pose space by exploiting the large constraints hidden in natural human motion. Bayesian mixture of experts (BME) is employed to establish the nonlinear mapping from the low-dimensional space to the high-dimensional pose space, and each expert handles a linear mapping in a local region. In order to calculate the gating of each expert, Gaussian mixture model (GMM) is used to approximate the probability distribution over the manifold space to obtain the prior probabilities and distribution models of experts. The experimental results on 3D hand and body pose estimation show an encouraging performance on both stability and accuracy.
     In order to alleviate the ambiguities caused by perspective projection from 3D scene onto 2D image plane, a novel framework based on semantic feedback for 3D human pose estimation is presented, which incorporates the high level motion knowledge to guide the pose estimation. A global temporal motion template is built to capture the temporal coherence between time-ordered poses. Local spatial motion correlations are created to preserve the nonlinear relationships between different body parts. The semantic knowledge is represented by both temporal motion template and spatial motion correlations, and is incorporated to rule out those implausible pose hypotheses and yield more accurate estimations. Experiments on the CMU Mocap database demonstrate that our method performs better on estimation accuracy than other methods without semantic feedback.
     A novel incremental leaning method, namely Incremental Discriminant-Analysis of Canonical Correlations (IDCC), is proposed and applied to the action recognition. It utilizes a discriminant matrix to project all the training actions to a new space, where the canonical correlations of actions within the same class are maximized and that of actions between different actions are minimized. To capture the large changes of human appearance undergoing various complex scenarios, the discriminant matrix of IDCC is incrementally updated with the new training data and thereby facilitates the recognition task in changing environments. Experiments on both regular and irregular action datasets demonstrate that our proposed method is able to recognize human actions with high accuracy and robustness in various non-stationary scenarios.
     A novel action descriptor based on spatio-temporal interest points is proposed for action recognition. It is represented by multiple bags of spatio-temporal distribution words to capture the spatio-temporal relationships between interest points over multiple local regions of different space-time scales in a video. A bag of appearance words is employed to capture the appearance information of interest points. Multiple bags of distribution words and a bag of appearance words respectively characterize the properties of“where”and“what”of interest points. A multiple kernel learning method is introduced to adaptively combine these two features to generate more descriptive and discriminative feature for recognition. The proposed method does not require any pre-processing of the action video such as object detection and human body tracking, and is robust to noise, camera movement and low resolution videos. Experiments on both single view and multiple view datasets show the effectiveness and robustness of recognition.
引文
[1]贾云得.机器视觉[M],北京:科学出版社, 2002.
    [2]《世界前沿科技发展报告》[M],科学技术部专题研究组编,科学出版社, 2007
    [3] C.R. Wren. Pfinder: Real-Time Tracking of the Human Body[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19 (1997), pp. 780–785.
    [4] X.G. Wang, X.X. Ma, E. Grimson. Unsupervised Activity Perception by Hierarchical Bayesian Models[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.
    [5] R. Urtasun and T. Darrell. Local Probabilistic Regression for Activity Independent Human Pose Inference[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [6] C. Liu, W. T. Freeman, E. H. Adelson, Y. Weiss. Human-assisted Motion Annotation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [7] A. Imai, N. Shimada, Y. Shirai. Hand Posture Estimation in Complex Backgrounds by Considering[C]. Asian Conference on Computer Vision, 2007.
    [8] F. De la Torre Frade, J. Campoy, Z. Ambadar, J.F. Cohn. Temporal Segmentation of Facial Behavior[C]. IEEE International Conference on Computer Vision, 2007.
    [9] W. Abd-Almageed, L. Davis. Human Detection using Iterative Feature Selection and Logistic Principal Component Analysis[C]. International Conference Robotics and Automation, 2008.
    [10] S. N. Vitaladevuni, V. Kellokumpu, L. S. Davis. Action Recognition using Ballistic Dynamics[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [11] A. Gupta, T. Chen, F. Chen, D. Kimber, L. Davis, Context and Observation Driven Latent Variable Model for Human Pose Estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [12] A. Yilmaz, M. Shah, A Differential Geometric Approach to Representing the Human Actions[J], Computer Vision and Image Understanding, 2008, 109(3):335-351.
    [13] P.K. Yan, S. M. Khan, M. Shah, Learning 4D Action Feature Models for Arbitrary View Action Recognition[C]. IEEE Conference on Computer Vision and PatternRecognition, 2008.
    [14] P. Natarajan, R.Nevatia. View and Scale Invariant Action Recognition using Multiview Shape-Flow Models[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [15] I. Laptev, B. Caputo, C. Schuldt, T. Lindeberg. Local Velocity-Adapted Motion Events for Spatio-Temporal Recognition[J]. Computer Vision and Image Understanding, 2007, 108:207-229.
    [16] http://www.cs.ucla.edu/~dt/.
    [17] http://www.stat.ucla.edu/~sczhu/.
    [18] http://www.cs.berkeley.edu/~malik/.
    [19] http://cvrc.ece.utexas.edu/.
    [20] http://vision.cs.princeton.edu/index.html.
    [21] http://www.cs.utoronto.ca/vis/index.html
    [22] http://www.nada.kth.se/cvap.
    [23] http://www.dcs.qmw.ac.uk/research/vision/projects/INSIGHT/index.html.
    [24] http://cavr.korea.ac.kr/research/e_research.htm.
    [25] http://nlpr-web.ia.ac.cn/index.asp.
    [26]T.B. Moeslund, E.Granum. A survey of Computer Vision-Based Human Motion Capture and Analysis[J]. Computer Vision and Image Understanding, 2006, 104: 90-126,
    [27]R. Poppe. Vision-Based Human Motion Analysis: an Overview [J]. Computer Vision and Image Understanding, 2007, 108: 4-18.
    [28] P. Turaga, R. Chellappa. Machine Recognition of Human Activities: a Survey [J]. IEEE Transactions on Circuits and System for Video Technology, 2008, 18(11): 1473-1488.
    [29] V. Athitsos, S. Sclaroff. Estimating 3D Hand Pose from a Cluttered Image[C],IEEE Conference on Computer Vision and Pattern Recognition, 2003.
    [30] G. Mori, J. Malik. Recovering 3D Human Body Configurations using Shape Context Matching[J], IEEE Trancetions on Pattern Analysis and Machine Intelligence, 2006 28(7): 1052-1062.
    [31] G. Shakhnarovich, P. Viola, T. Darrell. Fast Pose Estimation with Parameter Sensitive Hashing[C]. IEEE International Conference on Computer Vision, 2003.
    [32] B. Stenger, A. Thayananthan, P. H.S. Torr, R. Cipolla, Model-Based Hand Tracking Using a Hierarchical Bayesian Filter[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006 28(9): 1372 - 1384 .
    [33] H.Y. Guan, R. S. Feris, M. Turk, The Isometric Self-Organizing Map for 3D Hand Pose Estimation[C]. IEEE International Conference on Automatic Face and Gesture Recognition, 2006.
    [34] A.Agarwal, B. Triggs, 3D Human Pose from Silhouettes by Relevance Vector Regression[C], IEEE Conference on Computer Vision and Pattern Recognition, 2004.
    [35] A.Agarwal, B. Triggs, Learning to Track 3D Human Motion from Silhouettes[C], International Conference on Machine Learning, 2004.
    [36] T.E. Campos, D.W. Murray, Regression-Based Hand Pose Estimation from Multiple Cameras[C], IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [37] A. Elgammal, C. S. Lee. Inferring 3D Body Pose from Silhouettes using Activity Manifold Learning, IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [38] A. Thayananthan, R. Navaratnam, B. Stenger, P.H.S. Torr, R. Cipolla, Multivariate Relevance Vector Machines for Tracking[C], European Conference on Computer Vision, 2006.
    [39] C. Sminchisescu, A. Kanaujia, Z.G. Li, D. Metaxas, Discriminative Density Propagation for 3D Human Motion Estimation[C],IEEE Conference on Computer Vision and Pattern Recognition, 2005.
    [40] L. Sigal, S. Bhatia, S. Roth, M. J. Black, M. Isardy. Tracking Loose-Limbed People[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2004.
    [41] PF Felzenszwalb, DP Huttenlocher, Pictorial Structures for Object Recognition[J]. International Journal of Computer Vision, 2005 61(1): 55–79.
    [42] R. Ronfard, C. Schmid, B. Triggs. Learning to Parse Pictures of People[C]. European Conference on Computer Vision, 2002.
    [43] M. Isard. PAMPAS: Real-valued Graphical Models for Computer Vision[C]. IEEEConference on Computer Vision & Pattern Recognition, 2003.
    [44] T.X. Han, H.Z. Ning, T. S. Huang. Efficiecnt Nonparametric Belief Propagation with Application to Articulated Body Tracking[C]. IEEE Conference on Computer Vision & Pattern Recognition, 2006.
    [45] X.F. Ren, A.C. Berg, J. Malik. Recovering Human Body Configurations using Pairwise Constraints between Parts[C]. IEEE International Conferenece on Computer Vision, 2005.
    [46]C. Chang, C. Wu, H. Aghajan. Pose and Gaze Estimation in Multi-Camera Networks for Non-Restrictive HCI[C]. IEEE Intertional Conference on Computer Vision Workshop on HCI, 2007.
    [47]J. Ben-Arie, Z.Q. Wang, P. Pandit, S. Rajaram. Human Activity Recognition using Multidimensional Indexing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002 24(8):1091– 1104.
    [48] H. Sidenbladh, M. Black, D. Fleet. Stochastic Tracking of 3D Human Figures Using 2D mage Motion[C]. European Conference on Computer Vision, 2000.
    [49] J. Deutscher, A. Blake, I. Reid. Articulated Body Motion Capture by Annealed Particle Fltering[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2000.
    [50] S.Wachter, H. Nagel. Tracking Persons in Monocular Image Sequences[J]. Computer Vision and Image Understanding, 74(3):174–192, 1999.
    [51] J. Deutscher, A. Blake, I. Reid. Articulated Body Motion Capture by Annealed Particle Filtering[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2000.
    [52] Y. Azoz, L. Devi, R. Sharma. Reliable Tracking of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1998.
    [53] T. Darrell, G. Gordon, M. Harville, J Woodfill. Tracking of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1998.
    [54]C. Sminchisesecu, B. Triggs. Estimating Articulated Human Motion with Covariance Scaled Sampling[J]. International Journal of Robotics Research, 22(6): 371-393.
    [55] M. Black, P. Anandan. The Robust Estimation of Multiple Motions: Parametric andPiecewise Smooth Flow Fields[J]. Computer Vision and Image Understanding, 1996 6(1):57–92.
    [56]T. Cham, J. Rehg. A Multiple Hypothesis Approach to Figure Tracking[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1999.
    [57] S.Wachter, H. Nagel. Tracking Persons in Monocular Image Sequences[J]. Computer Vision and Image Understanding, 1999 74(3):174–192.
    [58] A.O. Balan, M.J. Black, An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [59] Y. Wu, J.Y. Lin, T. S. Huang. Capturing Natural Hand Articulation[C]. IEEE International Conference on Computer Vision, 2001.
    [60] M. Kato, Y.W. Chen, G. Xu. Articulated Hand Tracking by PCA-ICA Approach[C]. IEEE International Conference on Automatic Face and Gesture Recognition, 2006.
    [61] T.P. Tian, R. Li, S. Sclaroff. Tracking Human Body Pose on a Learned Smooth Space[J]. Boston University Computer Science Technology Report No. 2005-09, Aug. 2005.
    [62] R. Urtasun, D.J. Fleet, A. Hertzmann, P. Hua. Priors for People Tracking from Small Training Sets. IEEE International Conference on Computer Vision, 2005.
    [63] R. Li, M.H. Yang, S. Sclaroff, T.P. Tian. Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers[C]. European Conference on Computer Vision, 2006.
    [64] R. Urtasun, D. J. Fleet, P. Fua. 3D People Tracking with Gaussian Process Dynamical Model[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [65] R. Rosales, S. Sclaroff. Combining Generative and Discriminative Models in a Famework for Articulated Pose Estimation[J]. International Journal of Computer Vision, 2006 67 (3), 251–276.
    [66] C. Sminchisescu, A. Kanaujia, D. Metaxas. Learning Joint Top-down and Bottom-up Processes for 3D Visual Inference[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [67] W.Y. Chang, C.S. Chen, Y.P. Hung. Appearance-Guided Particle Filtering forArticulated Hand Tracking[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [68] S. Ali, A. Basharat, M. Shah. Chaotic Invariants for Human Action Recognitionp[C]. IEEE International Conference on Computer Vision, 2007.
    [69] F. Lv, R. Nevatia, M.W. Lee. 3D Human Action Recognition using Spatial-Temporal Motion Templates[C]. IEEE Conference on Computer Vision Workshop on HCI, 3766:120-130.
    [70]A.Bobick. Movement, Activity and Action: the Role of Knowledge in the Perception of Motion[J]. Philosophical Transactions of the Royal Society of London 1997 342: 1257-1265.
    [71]A.Bobick, J.Davis. The Recognition of Human Movement using Temporal Templates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001 23(3): 257-267.
    [72] G.R. Bradski, J.W. Davis. Motion Segmentation and Pose Recognition with Motion History Gradients[J]. Machine Vision and Applications, 2002, 13 (3):174-184.
    [73] D. Weinland, R. Ronfard, E. Boyer. Free Viewpoint Action Recognition using Motion History Volumes[J]. Computer Vision and Image Understanding, 2006 104(2-3):249–257.
    [74] A. A. Efros, A. C. Berg, G. Mori, J. Malik. Recognizing Action at a Distance[C]. IEEE International Conference on Computer Vision, 2003.
    [75] A. Yilmaz, M. Shah. Actions Sketch: A Novel Action Representation[C]. IEEE Conference on Computer Vision and Pattern Recogntion, 2005.
    [76] M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri. Actions as Space Time Shapes[C]. IEEE International conference on Computer Vision, 2005.
    [77] I. Laptev. On Space-time Interest Points[J]. International Journal of Computer Vision, 2005 64(2-3):107–123.
    [78] C. Sch¨uldt, I. Laptev, B. Caputo. Recognizing Human Actions: A Local SVM Approach[C]. IEEE International Conference on Pattern Recognition, 2004.
    [79] P. Doll′ar, V. Rabaud, G. Cottrell, S. Belongie. Behavior Recognition via Sparse Spatio-Temporal Features[C]. IEEE International Conference on Computer VisionWorkshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.
    [80] J. C. Niebles, H. Wang, L. Fei-Fei. Unsupervised Learning of Human Action Categories using Spatial-temporal Words[C]. British Machine Vision Conference, 2006.
    [81] M. Bregonzio, S.G. Gong, T. Xiang. Recognising Action as Clouds of Space-Time Interest Points[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [82] A. Oikonomopoulis, I. Patras, M. Pantic. Kernel Spatiotemporal Salient Points for Visual Recognition of Human Actions[J]. IEEE Transctions on System, Man, and Cybernetics—PART B: Cybernetics, 2006 36(3): 710-719.
    [83] S.F. Wong, R. Cipolla. Extracting Spatiotemporal Interest Points using Global Information[C]. IEEE International Conference on Computer Vision, 2007.
    [84] P. Scovanner, S. Ali, M. Shah. A 3-dimensional Sift Descriptor and Its Application to Action Recognition. ACM Multimedia, 357–360, 2007.
    [85] Z.M. Zhang, Y.Q. Hu, S. Chan, L.T. Chia. Motion Context: A New Representation for Human Action Recognition[C]. European Conference on Computer Vision, 2008
    [86] J. Niebles, L. Fei Fei. A Hierarchical Model of Shape and Appearance for Human Action Classification[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.
    [87] J.G. Liu, J.B. Luo, M. Shah. Recognizing Realistic Actions from Videos“in the Wild”[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [88] K. Mikolajczyk, H. Uemura. Action Recognition with Motion-appearance Vocabulary Forest[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [89] J.G. Liu, S.Ali, M. Shah. Recognizing Human Actions Using Multiple Features[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [90]G. Schindler, L. Zitnick, M.Brown. Internet Video Category Recognition. Internet Vision, 2008.
    [91] L.L. Cao, J.B. Luo, F. Liang, T. S. Huang. Heterogeneous Feature Machines for Visual Recognition[C]. IEEE International Conference on Computer Vision, 2009.
    [92] M. D. Rodriguez, J.Ahmed, M. Shah. Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [93] T.K. Kim, S.F. Wong, R. Cipolla. Tensor Canonical Correlation Analysis for Action Classification[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.
    [94] J.S. Yuan, Z.C Liu, Y. Wu. Discriminative Subvolume Search for Efficient Action Detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [95]M. Ryoo, J. Aggarwal. Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities[C]. IEEE International Conference on Computer Vision, 2009.
    [96] C.C. Chang, C.J. Lin. LIBSVM: a Library for Support Vector Machines, 2001. Software available online at http://www.csie.ntu.edu.tw/cjlin/libsvm.
    [97] J. Lafferty, A. McCallum, F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. International Conference on Machine Learning, 2001.
    [98] K. Schindler, L.V. Gool. Action Snippets: How Many Frames does Human Action Recognition Require[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2009.
    [99] H. Jhuang, T. Serre, L. Wolf, T. Poggio. A Biologically Inspired System for Action Recognition[C]. IEEE International Conference on Computer Vision, 2007.
    [100]M. Marsza?ek, I. Laptev, C. Schmid. Actions in Context[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [101]C. Sminchisescu, A. Kanaujia, Z. Li, D. Metaxas. Conditional Models for Contextual Human Motion Recognition[C]. IEEE International Conference on Computer Vision, 2005.
    [102] L. Wang, D. Suter. Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.
    [103] S. Wang, A. Quattoni, L. Morency, D. Demirdjian, T. Darrell. Hidden ConditionalRandom Fields for Gesture Recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006.
    [104] L. Morency, A. Quattoni, T. Darrell. Latent-Dynamic Discriminative Models for Continuous Gesture Recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.
    [105] L.Han, X.X. Wu, W. Liang, Y.D. Jia, G.M. Hou. Discriminative Human Action Recognition in the Learned Hierarchical Manifold Space[C]. Image and Vision Computing, 2010 28(5): 836-849.
    [106] L. Rabiner. A Tutorial on Hidden Markov Models and Selective Applications in Speech Recognition[J]. Proceedings of IEEE, 1989 77(2): 257-286.
    [107] M. Brand, N. Oliver, A. Pentland. Coupled Hidden Markov Models for Complex Action Recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1997.
    [108] N.T. Nguyen, D. Q. Phung, S. Venkatesh, H. Bui. Learning and Detecting Activities From Movement Trajectories using the Hierarchical Hidden Markov Models[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2005.
    [109] T. V. Duong, H. H. Bui, D. Q. Phung, S. Venkatesh. Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Mode[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2005.
    [110] Z. Ghahramani, M. Jordan. Factorial Hidden Markov Models[J]. Machine Learning, 1997, 29(203),245-273.
    [111] D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003 3:993–1022.
    [112] T. Hofmann. Probabilistic Latent Semantic Indexing[C]. International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999.
    [113] J. Yamato, J. Ohya, K. Ishii. Recognizing Human Action in Time Sequential Image using Hidden Markov Mode[C]. IEEE Conference on Computer Vision and Pattern Recognition, 1992.
    [114] N. M. Oliver, B. Rosario, A. P. Pentland. Bayesian Computer Vision System for Modeling Human Interaction[J]. IEEE Transactions on Pattern Analysis and MachineIntelligence, 2000 22(8):831-843.
    [115] L. Fei-Fei, P. Perona. A Bayesian Hierarchy Model for Learning Natural Scene Categories[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2005.
    [116] J.Sivic, B.C. Russell, A.A. Efros, A.Zisserman, W.T. Freeman. Discovering Objects and their Location in Images[C]. IEEE International Conference on Computer Vision, 2005.
    [117] S. Savarese, A. Delpozo, J. C. Niebleds, Li. Fei-Fei. Spatial-temporal Correlations for Unsupervised Action Classification[C]. IEEE Workshop on Motion and Video Computing, 2008.
    [118] S.F. Wong, T.K. Kim, R. Cipolla. Learning Motion Categories using both Semantic and Structural Information[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.
    [119]C. Sminchisescu, A. Kanujia, Z. G. Li, D. Metaxas. Conditional Visual Tracking in Kernel Space. Advances in Neural Information Processing Systems, 2005.
    [120] X.F. He, D.S. Cai, C. Yan, H.J. Zhang. Neighborhood Preserving Embedding[C]. IEEE International Conference on Computer Vision, 2005.
    [121] L. Xu, M. I. Jordan, G. E. Hinton. An Alternative Model for Mixtures of Experts[J]. Advances in Neural Information Processing Systems, 1995.
    [122] J. Lee, T. L. Kunii. Model-based analysis of hand posture. IEEE Computer Graphics and Application, 1995.
    [123] A. Agarwal, B. Triggs. Recovering 3D Human Pose from Monocular Images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006 28 (1): 44–58.
    [124] A. Agarwal, B. Triggs. Tracking Articulated Motion using a Mixture of Autoregressive Models[C]. European Conference on Computer Vision, 2004.
    [125] J. M. Wang, D. J. Fleet, A. Hertzmann. Gaussian Process Dynamical Models for Human Motion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2008 20(2): 282–298.
    [126] A. Agarwal, B. Triggs. Monocular Human Motion Capture with a Mixture of Regressors[C]. IEEE Workshop on Vision for Human Computer Interaction, 2005.
    [127] M. E. Tipping. Sparse Bayesian Learning and the Relevance Vector Machine[J]. Journel of Machine Learning Research, 2001 1: 211–244.
    [128] http://mocap.cs.cmu.edu/.
    [129] T. K. Kim, J. Kitter, R. Cipolla. Discriminative Learning and Recognition of Image Set Classes using Canonical Correlations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007 29(6): 1005-1018.
    [130] P. Hall, D. Marshall, R. Martin. Merging and Splitting Eigenspace Models[J]. IEEE Transactions on Pattern Analysis and Machine Learning, 2000 22(9): 1042-1049..
    [131] F. Wang, C. Zhang. Label Propagation through Linear Neighborhoods[J]. IEEE Transactions on Knowledge and Data Engineering, 2008 20: 55-67.
    [132] S.N. Pang, S. Ozawa, N. Kasabov. Incremental Linear Discriminant Analysis for Classification of Data Streams[J]. IEEE Transactions on Systems, and Cybernetics- Part B: Cybernetics, 2005 35(5): 905-914. [133 ]P. Hall, R. Martin. Incremental Eigenanalysis for Classification[C]. British Machine Vision Conference, 1998.
    [134] K. Jia, D. Y. Yeung. Human Action Recognition using Local Spatio-Temporal Discriminant Embedding[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [135] Y. Wang, P. Sabzmeydani, G. Mori. Semi-latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition[C]. Workshop on Human Motion Understanding, Modeling, Capture and Animation, 2007.
    [136] Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y., Simple MKL, 2008.
    [137] D. Weinland, R. Ronfard, E. Boyer. Free Viewpoint Action Recognition using Motion History Volumes[J]. Computer Vision and Image Understanding, 2006 104: 249-257.
    [138] A. Klaser, M. Marszalek, C. Schmid, I. Grenoble. A Spatio-Temporal Descriptor Based on 3D-Gradients[C]. British Machine Vision Conference, 2008.
    [139] J. Liu, M. Shah. Learning Human Actions via Information Maximization[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
    [140] P. Yan, S. Khan, M. Shal. Learning 4D Action Feature Models for Arbitrary View Action Recognition[C]. IEEE Conference on Computer Vision and PatternRecognition, 2008.
    [141] D. Weinland, E. Boyer, R. Ronfard. Action Recognition from Arbitrary Views using 3d Exemplars[C]. IEEE International Conference on Computer Vision, 2007.
    [142] F. Lv, R. Nevatia. Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2007.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700