用户名: 密码: 验证码:
基于视频的人体姿势预测与跟踪
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
基于视频的人体姿势预测与跟踪,在智能视频监控和人机交互等领域中具有广泛的应用,正获得越来越多的关注。但是,由于受到高维状态空间、复杂背景、遮挡、光照变化和外观变化等因素的影响,视频中人体姿势的预测仍然是一个非常困难的问题。
     本文主要研究单目视觉下人体姿势的预测和跟踪问题,涉及的具体问题包括:如何从图像中提取多种线索用于预测2D人体姿势;如何在挖掘隐变量的同时保持数据间的局部结构;如何同时得到数据的分割和噪声点;如何综合多种线索得到人体运动模型。
     本文的主要贡献概括如下:
     1)在大多数2D人体姿势预测中,基于颜色和形状的肢体检测被用于初始化模型,但是由于模板的局限性,肢体检测的办法只能在特定环境下使用。受到流形学习Isomap的启发,本文首先使用测地距局部最大准则来得到人体最外部关键点位置。初始化工作结束后,使用类似置信传递的方法,利用运动学限制、外观限制和角度限制,分层测算出人体其他关节点位置。其中,角度限制基于这样的假设:人体运动是非刚体运动,但是可以用刚体运动来近似局部肢体变化,所以人体剪影上角度的突然变化意味着关节点的存在,而且角度变化的越大,关节点存在的概率就越大,本文用余弦定理来建模这种概率模型。
     2)在辨别式3D人体姿势预测中,回归模型学习图像特征和人体姿势间的统计关系,并将其用于预测测试图像中的人体姿势。但是,参数回归模型建立观测数据和目标数据问的单模态关系。而非参数模型过于依赖局部信息,很难扩展到高维数据,当数据维数增加时,这种算法在高维空间会遇到近邻稀疏问题。除此之外,这种基于记忆的方法需要存储所有的训练数据,以至于会大量增加空间复杂度。本文提出一种具有非参数模型灵活性的全局参数模型,将人体姿势预测问题看作是密度函数估计问题,使用隐高斯混合回归LGMR建模数据空间的联合概率密度,并通过全局条件概率密度函数显式地得到回归函数。同时,本文使用保局投影学习目标数据的流形空间,利用其保持局部信息的特性,显式地建立原始空间和流形空间的全局双向映射。
     3)针对隐高斯混合回归模型依旧存在的过拟合问题,本文提出基于变分预测的半参数回归模型VLGMR。变分分布作为真实后验分布的近似,满足一系列简化推导的条件,然后在后验分布和变分分布KL距离最小的准则下,得到对数似然比最大的预测结果。
     4)典范对应分析CCA联合观测数据和目标数据的信息来寻找基矩阵,以最大化低维空间投影的相关性,但是这种方法不能保持数据的局部结构信息。而LPP等方法只能分别求取观测变量和目标变量各自的低维空间表示,不能结合两者的信息求取保持局部信息的隐含空间表示。本文提出对称局部保持隐变量模型CLPLVM,使得图像特征和姿势数据在低维空间的投影的达到最大相关性时,也能够保持数据间的局部结构信息。
     5)由于视频中存在大量的时域冗余信息,视频中人体运动的建模与单张图片中人体姿势的建模有很大不同。同样的运动类型应该具有相似的隐含信息。本文通过求取图像特征和运动数据的低秩表示,从运动视频中提取字典,用字典表示每帧中运动的人体姿势,这样人体运动的矩阵表示就变成低秩矩阵表示,而且,在恢复人体运动表示的同时,还检测出原始数据的噪声点。
     6)利用运动捕获数据学习人体运动模型,往往得到的结果存在泛化能力不强的问题。本文使用深层学习的同时,将图像信息融入条件受限波尔兹曼机的代价函数,通过图像信息调整人体运动模型,使预测出的人体运动更符合当前帧信息。
Vision-based pose estimation has been a focus of much research in vision due to abundance of applications for marker-less motion capture in activity recognition and human computer interaction. Despite much research, however, monocular pose estimation remains a difficult task; challenges include high-dimensionality of the state space, image clutter, occlusions, lighting and appearance variations, to name a few.
     This paper is committed to the key issues of human pose estimation and poses tracking in monocular video, and some solutions are proposed for the important problems of these two technologies. The research work mainly covers the following topics:how to combine multi-cues extracted from images to predict 2D human pose; how to explore latent vairiables while keeping the local structure; how to segment the subspace, and detect the noise at the same time; how to combine multi-cues to learn the dynamic model. The contributions of this paper can be described as follows:
     1) In most work of 2D human pose estiation, color-based or shape-based limb detection were used to initialize the model, but these approaches could only be used in specific situation, owing to the limitation of the template. Motivited by Isomap in manifold learning, we propose a hieraichical human pose estimation method. Seed points were extracted by making use of geodesic distance. Then, less distinguishable joints were detected by employing various cues, such as kinematic constraint, appearance constraint and curvature constraint, where curvature constraint was based on the assumption:human motion was a kind of non-rigid trasformation, but local transform between limbs could be approximated by a rigid transformation. Hence, the sudden change on the silhouette represented the existence of the joint, and the more variation of the angle, the more probability of the joint existstance. We used Cosine Theorem to model these probability distributions.
     2) In discriminative approaches, regression models learned the statistical relation between the image features and the human poses, and used this relation to infer the human pose when a test image was given. However, on the one side, simple parametric models were unable to deal with a multi-modal nature of the problem; on the other side, non-parametric methods, were not able to model arbitrary complex relationships between input features and output poses, because they subjected to the dimensionality and availability of the training data. Moreover, these memory-based methods needed plenty of space to restore all the training data, which make them infeasible to be execuated in the real situation. We learned a multi-modal joint density model between the image features and the 3D poses, in the form of a Gaussian Mixture Model (GMM). GMM allowed us to deal with multi-modality in the data and derive explicit conditional distributions for inference, in the form of Latent Gaussian Mixture Regression (LGMR). We also proposed to use Locality Preserving Projections (LPP) that while learning linear mapping can discover non-linear manifold structure. LPP also provided us with closed form forward and backward mappings between the latent space(s) and input/output space(s).
     3) To deal with over-fitting problem in Latent Gaussian Mixture Regression for human pose estimation, we have developed a semi-parametric regression model in latent space with variational inference. Variational distribution could adjust free variational parameters to approximate posterior distribution, and it was restricted to the factorized form, which could be gained by maximizing the marginal log-likelihood, where KL distance, reduced the distance between the posterior and the variational distribution, was minimized.
     4) Canonical Correlation Analysis (CCA) found basis matrices to maximize the relation in the dimension-reduced space, but some data with both high similarities in the observations and the poses in the original space, had long distance between each other in the dimension-reduced space. LPP could only map one side to local preserving low-dimensional space, or reduce the feature and pose dimensionality individually, so they could not guarantee both input and output preserving the local structure. We extended our work, and proposed Canonical Local Preserving Latent Variable Model (CLPLVM). A cost function was constructed to learn the latent variables that keep both local structure of input and that of output in the original space, while maximizing the relation between input and output in the latent space at the same time.
     5) Modeling human motion from a video was different from pose estimation from a single frame, as the video sequences usually had very high temporal redundancy which should be effectively used for better performance, and human poses of the same motion type should have similar underlying structures. Thus, if we could represent the features of the motion in a video as a dictionary, such a representation matrix would become a low-rank matrix. As a result, the problem of modeling motion from a video was converted to the problem of learning a low-rank representation matrix from a dictionary. Moreover, this method could detect corrupted observations at the same time.
     6) A lot of effort had been spent to learn the the dynamic model from the motion capture database, and information from the videos, which was coupled to the the knowledge of the human pose, was totally abandoned. We added connections between the images and the poses\latents, which turned the CRBM model into a multi-cues CRBM (MC-CRBM) model. By combining these constraints from different cues, a better solution of prior model was obtained.
引文
[1]C.M. Bishop, et al. Pattern Recognition and Machine Learning, New York, Springer-Verlag,2006,137-173.
    [2]Hampapur A., Brown L.. Connell J., et al. Smart Video Surveillance:Exploring the Concept of Multiscale Spatiotemporal Tracking, Signal Processing Magazine,22(2),2005,38-51.
    [3]Hu W., Tan T., Wang L., et al. A Survey on Visual Surveillance of Object Motion and Behaviors, IEEE Transactions on Systems. Man, and Cybernetics, Part C:Applications and Reviews,34(3),2004,334-352.
    [41 Hilton A., Pascal F.. Modeling People toward Vision-Based Understanding of a Person's Shape, Appearance, and Movement, Computer Vision and Image Understanding,81(3),2001,227-230.
    [5]Hampapur A., Brown L., Connell J., et al., Smart Surveillance:Applications, Technologies and Implications, Proceedings of Joint Conference of the Fourth International Conference on Information, Communications and Signal Processing and the Fourth Pacific Rim Conference on Multimedia,2003, 1133-1138.
    [6]Thomas, V. and Chawla, N.V. and Bowyer, K.W. and Flynn, P.J., Learning to predict gender from iris images, First IEEE International Conference on Biometrics:Theory, Applications, and Systems (BTAS),2007,1-5.
    [7]Hasler, N. and Stoll, C. and Rosenhahn, B. and Thormhlen, T. and Seidel, H.P., Estimating body shape of dressed humans. Computers & Graphics,33(3),2009,211-216.
    [8]Kim, M. and Pavlovic, V., Discriminative Learning of Dynamical Systems for Motion Tracking, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2007,1-8.
    [9]Peursum, P. and Venkatesh, S. and West, G., A study on smoothing for particle-filtered 3D human body tracking, International journal of computer vision,87(1),2010,53-74.
    [10]Szeliski, R., Computer vision:Algorithms and applications, Springer-Verlag New York Inc,2010.
    [11]Li, R. and Tian, T.P. and Sclaroff, S. and Yang, M.H.,3D human motion tracking with a coordinated mixture of factor analyzers, International journal of computer vision,87(1),2010,170-190.
    [12]Urtasun, R. and Fua, P.,3D human body tracking using deterministic temporal motion models, IEEE Proceedings of European Conference on Computer Vision,2004,92-106.
    [13]Jenkins, O.C. and Gonzalez, G. and Loper, M., Dynamical Motion Vocabularies for Kinematic Tracking and Activity Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop,2006,147-154.
    [14]Fossati, A. and Salzmann, M. and Fua, P., Observable subspaces for 3D human motion recovery, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2009,1137-1144.
    [15]Sigal, L. and Memisevic, R. and Fleet, D.J., Shared kernel information embedding for discriminative inference, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2009, 1063-6919.
    [16]K. Mikolajczyk and H. Uemura, Action recognition with motion-appearance vocabulary forest, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2008,1-8.
    [17]P. Eisert, P. Fechteler, J. Rurainsky,3-D Tracking of Shoes for Virtual Mirror Applications, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2008,152-159.
    [18]H. Sidenbladh, M. Black, D. Fleet, Stochastic tracking of 3d human figures using 2d image motion, IEEE Proceedings of European Conference on Computer Vision,2000,702-718.
    [19]C. Sminchisescu, B. Triggs, Covariance scaled sampling for monocular 3d body tracking, IEEE Proceedings of Computer Vision and Pattern Recognition,1,2001,440-447.
    [20]L. Sigal, A. Balan, and M.J. Black, Combined Discriminative and Generative Articulated Pose and Non-rigid Shape Estimation, Advances in Neural Information Processing Systems,2008,171-178.
    [21]A. Fathi, G. Mori, Human Pose Estimation Using Motion Exemplars, IEEE International Conference on Computer Vision,2007,1-8.
    [22]Urtasun R. Salzmann, M, Combining Discriminative and Generative Methods for 3D Deformable Surface and Articulated Pose Reconstruction, IEEE Conference on Computer Vision and Pattern Recognition,2010,1-8.
    [23]A. Agarwal, B. Triggs.3d human pose from silhouettes by relevance vector regression. IEEE Proceedings of Computer Vision and Pattern Recognition,2,2004,882-888.
    [24]A. Agarwal, B. Triggs, Monocular human motion capture with a mixture of regressors, IEEE Workshop on Computer Vision and Pattern Reeognition,2005,72-72.
    [25]A. Bissacco, M. Yang, S. Soatto, Fast human pose estimation using appearance and motion via multi-dimensional boosting regression, IEEE Proceedings of Computer Vision and Pattern Recognition, 2007,1-8.
    [26]L. Bo, C. Sminchisescu, Structured output-associative regression, IEEE Proceedings of Computer Vision and Pattern Recognition,2009,2403-2410.
    [27]A. M. Elgammal, and C.-S. Lee, Inferring 3d Body Pose from Silhouettes Using Activity Manifold Learning, IEEE Proceedings of Computer Vision and Pattern Recognition,2,2004,681-688.
    [28]A. Fathi, G. Mori, Human pose estimation using motion exemplars, IEEE Proceedings of International Conference on Computer Vision,2007,1-8.
    [29]F. Guo, G. Qian, Learning and inference of 3d human poses from Gaussian mixture modeled silhouettes. IEEE Proceedings of International Conference on Pattern Recognition,2,2006,43-47.
    [30]T. Jaeggli, E. Koller-Meier, L. Van Gool, Learning generative models for multi-activity body pose estimation, International Journal of Computer Vision,83,2009,121-134.
    [31]A. Kanaujia, C. Sminchisescu, D. Metaxas, Spectral latent variable models for perceptual inference, IEEE Proceedings of International Conference on Computer Vision,2007,1-8.
    [32]R. Navaratnam, A. Fitzgibbon, R. Cipolla, The joint manifold model for semi-supervised multi-valued regression, IEEE Proceedings of International Conference on Computer Vision,2007,1-8.
    [33]G. Shakhnarovich, P. Viola, and T. Darrell, Fast Pose Estimation with Parameter-sensitive Hashing, IEEE Proceedings of International Conference on Computer Vision,2003,750-757.
    [34]C. Sminchisescu, A. Kanaujia, Z. Li, D. Metaxas, Discriminative density propagation for 3d human motion estimation, IEEE Proceedings of Computer Vision and Pattern Recognition,1,2005,390-397.
    [35]C. Sminchisescu, A. Kanaujia, D. Metaxas, Learning joint top-down and bottom-up processes for 3d visual inference, IEEE Proceedings of Computer Vision and Pattern Recognition,2,2006,1743-1752.
    [36]R. Urtasun, T. Darrell, Sparse probabilistic regression for activity independent human pose inference, IEEE Proceedings of Computer Vision and Pattern Recognition,2008,1-8.
    [37]Wu, B. and Nevatia, R., Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors, International Journal of Computer Vision,75(2),2007, 247-266.
    [38]Demirdjian, D., Combining geometric-and view-based approaches for articulated pose estimation, IEEE Proceedings of European Conference on Computer Vision (ECCV),2004,183-194.
    [39]Gupta, A. and Chen, T. and Chen, F. and Kimber, D. and Davis, L.S., Context and observation driven latent variable model for human pose estimation, IEEE Proceedings of Computer Vision and Pattern Recognition,2008,1-8.
    [40]Lee, M.W. and Nevatia, R., Human pose tracking in monocular sequence using multilevel structured models, IEEE transactions on pattern analysis and machine intelligence,31(1),2008,27-38.
    [41]Sigal, L. and Fleet, D. and Troje, N. and Livne, M., Human attributes from 3D pose tracking, IEEE Proceedings of European Conference on Computer Vision,2010,243-257.
    [42]Agarwal, A. and Triggs, B., Monocular human motion capture with a mixture of regressors, IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshop,2005,72-79.
    [43]Agarwal, A. and Triggs, B., Recovering 3D human pose from monocular images, IEEE transactions on pattern analysis and machine intelligence,28(1),2006,44-58.
    [44]Sminchisescu, C. and Kanaujia, A. and Li, Z. and Metaxas, D., Conditional visual tracking in kernel space, Advances in neural information processing systems,18,2006,1249-1256.
    [45]Li, R. and Yang, M.H. and Sclaroff, S. and Tian, T.P., Monocular tracking of 3D human motion with a coordinated mixture of factor analyzers, IEEE Proceedings of European Conference on Computer Vision,2006,137-150.
    [46]T. Tian, R. Li, and S. Sclaro, Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Worshop, San Diego,2005,50-57.
    [47]Elgammal, A. and Lee, C.S., Inferring 3D body pose from silhouettes using activity manifold learning, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2,2004,681-688.
    [48]Lee, C.S. and Elgammal, A., Modeling view and posture manifolds for tracking, IEEE International Conference on Computer Vision,2007.1-8.
    [49]Jaeggli, T. and Koller-Meier, E. and Van Gool. L., Learning generative models for monocular body pose estimation, Proceedings of Asian conference on Computer vision,1,2007,608-617.
    [50]Jaeggli. T. and Koller-Meier, E. and Van Gool, L., Learning generative models for multi-activity body pose estimation, International journal of computer vision,83(2),2009,121-134.
    [51]Schwarz, L. and Mateus, D. and Navab. N., Multiple-activity human body tracking in unconstrained environments, International conference on Articulated Motion and Defonnable Objects,2010.192-202.
    [52]Y. Tian, L. Sigal, H. Badino, F. De la Tone, Y. Liu, Latent Gaussian mixture regression for human pose estimation, Asian Conference on Computer Vision,1,2010,238-245.
    [53]Brand, M., Shadow puppetry. International conference on Computer Vision,2,1999,1237-1244.
    [54]Taylor, G.W. and Hinton, G.E. and Roweis. S.T., Modeling human motion using binary latent variables. Advances in neural information processing systems,19,2007,1345-1352.
    [55]Taylor, G.W. and Sigal, L. and Fleet, D.J. and Hinton, G.E., Dynamical binary latent variable models for 3D human pose tracking, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2010,631-638.
    [56]Urtasun, R. and Fleet, DJ and Fua, P.,3D People Tracking with Gaussian Process Dynamical Models, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,1,2006,238-245.
    [57]Wang, J.M. and Fleet, D.J. and Hertzmann, A., Gaussian process dynamical models for human motion, IEEE transactions on pattern analysis and machine intelligence,30(2),2007,283-298.
    [58]Chen, J. and Kim, M. and Wang, Y. and Ji, Q., Switching Gaussian Process Dynamic Models for simultaneous composite motion tracking and recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2009,2655-2662.
    [59]J.Lafferty, A. McCallum, and F. Pereira, Conditional random fields:probabilistic models for segmenting and labelling sequence data, International conference on machine learning,2001,282-289.
    [60]A. Quattoni, M. Collins, and T. Darrell, Conditional Random Fields for Object Recognition, Advances in neural information processing systems,2004.
    [61]Morency, L.P. and Quattoni, A. and Darrell, T., Latent-dynamic discriminative models for continuous gesture recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007,1-8.
    [62]Urtasun, R. and Fleet, D.J. and Fua, P., Temporal motion models for monocular and multiview 3D human body tracking, Computer Vision and Image Understanding,104(2),2006,157-177.
    [63]Han, TX and Ning, H. and Huang, TS, Fusion by optimal dynamic mixtures of proposal distributions, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,2009, 66-73.
    [64]Gall, J. and Yao, A. and Van Gool, L.,2d action recognition serves 3d human pose estimation, IEEE Proceedings of European Conference on Computer Vision,2010,425-438.
    [65]Brox, T. and Rosenhahn, B. and Gall, J. and Cremers, D., Combined region and motion-based 3D tracking of rigid and articulated objects, IEEE transactions on pattern analysis and machine intelligence,32(3),2009,402-415.
    [66]Enzweiler, M. and Eigenstetter, A. and Schiele, B. and Gavrila, D.M., Multi-cue pedestrian classification with partial occlusion handling, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2010,990-997.
    [67]Jain, A. K. and Dubes, R. C., Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, New Jersey,1988.
    [68]Kaufman, L. and Rousseeuew, P. J., Finding Groups in Data:An Introduction to Cluster Analysis, John Wiley & Sons, Hoboken,1990.
    [69]Jain, A. K., Duin, R. P. W., and Mao, J., Statistical pattern recognition:A review, IEEE Transactions on Pattern Analysis and Machine Intelligence,22,2000,4-37.
    [70]Jain, A. K., Topchy, A., Law, M. H. C., and Buhmann, J. M., Landscape of clustering algorithms, International Conference on Pattern Recognition,2004,260-263.
    [71]Brice, C. R. and Fennema, C. L., Scene analysis using regions, Artificial Intelligence,1,1970,205-226.
    [72]Horowitz, S. L. and Pavlidis, T., Picture segmentation by a tree traversal algorithm, Journal of the ACM, 23,1976,368-388.
    [73]Ohlander, R., Price, K., and Reddy, D. R., Picture segmentation using a recursive region splitting method, Computer Graphics and Image Processing,8,1078,313-333.
    [74]Pavlidis, T. and Liow, Y.-T., Integrating region growing and edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence,12,1990,225-233.
    [75]Leclerc, Y. G., Constructing simple stable descriptions for image partitioning, International Journal of Computer Vision,3,1989,73-102.
    [76]Mumford, D. and Shah, J., Optimal approximations by piecewise smooth functions and Variational problems, Comm. Pure Appl. Math.,5,1989,577-685.
    [77]Shi, J. and Malik, J., Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence,8,2000,888-905.
    [78]Felzenszwalb, P. F. and Huttenlocher, D. P., Efficient graph-based image segmentation, International Journal of Computer Vision,59,2004,167-181.
    [79]Mori. G, Guiding model search using segmentation. Beijing, China:Institute of Electrical and Electronics Engineers Inc,2005.
    [80]Thome, N., D. Merad, and S. Miguet, Learning articulated appearance models for tracking humans:a spectral graph matching approach, Signal Processing:Image Communication,23(10),2008,769-787.
    [81]Bernier, O., P. Cheung-Mon-Chan, and A. Bouguet, Fast nonparametric belief propagation for real-time stereo articulated body tracking, Computer Vision and Image Understanding,113(1),2009,29-47.
    [82]Hern., Bayesian approach for morphology-based 2-D human motion capture, IEEE Transactions on Multimedia,9(4),2007,754-764.
    [83]Kindermann, R. and J. L. Snell, Markov Random Fields and Their Applications, American Mathematical Society,1980.
    [84]Clifford, P., Markov random fields in statistics, In G. R. Grimmett and D. J. A.Welsh (Eds.), Disorder in Physical Systems, A Volume in Honour of John M. Hammersley, Oxford University Press,1990,19-32.
    [85]Gupta, A., A. Mittal, and L.S. Davis, Constraint integration for efficient multiview pose estimation with self-occlusions, IEEE Transactions on Pattern Analysis and Machine Intelligence,30(3),2008,493-506.
    [86]Choe, K.S. and B.N. Jang, Minority-carrier lifetime optimization in silicon MOS devices by intrinsic gettering, Journal of Crystal Growth,218(2),2000,239-244.
    [87]D. G. Lowe, Distinctive image features from scale-invariant key points, International Journal of Computer Vision,2004,60(2),91-110.
    [88]K. Mikolajczyk, C. Sehmid, A Performance Evaluation of Local Descriptions, IEEE Transactions on Pattern Analysis and Machine Intelligence,27(10),2005,1615-1630.
    [89]Arasanathan Thayananthan, Bjoern Stenge, Philip H. S., and Roberto Cipolla, Shape Context and Chamfer Matching in Cluttered Scenes, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2003,127-133.
    [90]Haibin Ling and David W. Jacobs, Using the Inner-Distance for classification of Articulated Shapes, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005),2005, 719-726.
    [91]Navneet Dalal and Bill Triggs, Histograms of oriented Gradients for Human Detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005),2005,886-893.
    [92]Ramammani N, Building Parameterized Action Representations from Observation, Pennsylvania, the University of Pennsylvania,2000.
    [93]J. Lee, A Hierarchical Approach to Motion Analysis and Synthesis for Articulated Figures, Ph.D. thesis, Department of Computer Science, KAIST,2000.
    [94]Golub, G. H. and C. F. Van Loan, Matrix Computations (Third ed.), John Hopkins University Press, 1996.
    [95]Basilevsky, A., Statistical Factor Analysis and Related Methods:Theory and Applications, Wiley,1994.
    [96]Tipping, M. E. and C. M. Bishop, Probabilisiic principal component analysis, Journal of the Royal Statistical Society,21(3),1999,611-622.
    [97]Rubin, D. B. and D. T. Thayer, EM algorithms for ML factor analysis, Psychometrika,47(1),1982,69-76.
    [98]Robert Pless, Using Isomap to Explore Video Sequences, Proceedings of International Conference on Computer Vision (ICCV03),2003,1-8.
    [99]J. B. Tenenbaum, V. deSilva, and J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science,290,2000,2319-2323.
    [100| T.F. Cox and M.A.A. Cox, Multidimensional Scaling, Chapman and Hall/CRC,2nd edition,2001.
    [101]L. Saul and S. Roweis, Think globally, fit locally:unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research,4,2003,119-155.
    [102]Kapur. J., Maximum Entropy Methods in Science and Engineering, Wiley,1989.
    [103]Schwarz, H. R., Finite element methods, Academic Press,1988.
    [104]Jordan, M. I., Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An introduction to Variational methods for graphical models, In M. I. Jordan (Ed.), Learning in Graphical Models,1999,105-162.
    [105]Parisi, G., Statistical Field Theory, Addison-Wesley,1988.
    [106]Boyd, S. and L. Vandenberghe, Convex Optimization, Cambridge University Press,2004.
    [107]C. Ek, P. Torr, N. Lawrence, Gaussian process latent variable models for human pose estimation, Workshop on Machine Learning and Multimodal Interactions,2007,132-43.
    [108]L. Sigal, R. Memisevic, D. J. Fleet, Shared kernel information embedding for discriminative inference, IEEE Proceedings of Computer Vision and Pattern Recognition,2009,2852-2859.
    [109]D. Hardoon, S. Szedmak, and J. Shawe-Taylor, Canonical Correlation Analysis:An overview with application to learning methods, Neural Computation,16,2004,2639-2664.
    [110]F. Bach, M. Jordan, Kernel Independent Component Analysis, the Journal of Machine Learning Research,3,2003,1-48.
    [111]P. Lai, and C. Fyfe, Kernel and Nonlinear Canonical Correlation Analysis, International Journal of Neural Systems,10,2000,365-378.
    [112]X. He, and P. Niyogi, Locality Preserving Projections, Advances in Neural Information Processing Systems,2003.
    [113]K. Weinberger, F. Sha, L. Saul, Learning a Kernel Matrix for Nonlinear Dimensionality Reduction, Proceedings of the twenty-first International Conference on Machine Learning,2004,106.
    [114]L. Song, A. Smola, K. Borgwardt, A. Gretton, Colored maximum variance unfolding, Advances in Neural Information Processing Systems,20,2008,1385-1392.
    [115]N. Lawrence, J. Quinonero-Candela, Local distance preservation in the gp-lvm through back constraints, Proceedings of the 23rd International Conference on Machine Learning,2006,513-520.
    [116]E. frontier, Curious Labs Poser, Computer Software,2007.
    [117]Carnegie Mellon Motion Capture Darabase, http://mocap.cs.cmu.edu/,2002.
    [118]A. Agarwal, B. Triggs, Recovering 3D human pose from monocular images, IEEE transactions on pattern analysis and machine intelligence,2006,44-58.
    [119]T. Sun, S. Chen, Locality preserving cca with applications to data visualization and pose estimation, Image and Vision Computing,25,2007,531-543.
    [120]C. W. Gear, Multibody grouping from motion images, International Journal on Computer Vision, 29(2),1998,133-150.
    [121]J. Yan and M. Pollefeys, A general framework for motion segmentation:Independent articulated, rigid, non-rigid, degenerate and nondegenerate, European Conference on Computer Vision,4,2006,94-106.
    [122]S. Rao, R. Tron, R. Vidal, and Y. Ma, Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories, IEEE Transactions on Pattern Analysis and Machine Intelligence,32(10), 2010,832-1845.
    [123]Y. Ma, H. Derksen, W. Hong, and J. Wright, Segmentation of multivariate mixed data via lossy data coding and compression, IEEE Transactions on Pattern Analysis and Machine Intelligence,29(9), 2007,1546-1562.
    [124]J. Mercer, Functions of positive and negative type, and their connection with the theory of integral equations, Philosophical Transactions of the Royal Society,209,1909,415-446.
    [125]E. J. Candes and B. Recht, Exact Matrix Completion via Convex Optimization, Foundations of Computational Mathematics,9(6),2009,717-772.
    [126]J.Ho, M.-H. Yang, J. Lim, K.-C. Lee, and D. J. Kriegman, Clustering appearances of objects under varying illumination conditions, IEEE Conference on Computer Vision and Pattern Recognition,1, 2003,11-18.
    [127]G. Liu, Z. Lin, X. Tang, and Y. Yu, Unsupervised object segmentation with a hybrid graph model (HGM), IEEE Transactions on Pattern Analysis and Machine Intelligence,32(5),2010,910-924.
    [1281 L. Lu and R. Vidal, Combined central and subspace clustering for computer vision applications, International conference on Machine learning,2006,593-600.
    [129]K. Huang and S. Aviyente, Sparse representation for signal classification, Advances in Neural Information Processing Systems,2006,609-616.
    [130]A. P. Costeira, Jo and T. Kanade, A multibody factorization method for independently moving objects, International Journal on Computer Vision,29(3),1998,159-179.
    [131]E. Elhamifar, R. Vidal, Sparse Subspace Clustering, IEEE Conference on Computer Vision and Pattern Recognition,2,2009,2790-2797.
    [132]G. Liu, Z. Lin, Y. Yu, Robust Subsapce Segmentation by Low-rank Representation, International Conference on Machine Learning,2010,663-670.
    [133]J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, Robust face recognition via sparse representation, IEEE Transactions on Pattern Analysis and Machine Intelligence.31.2008.210-227.
    [134]H. Yang, S. Lee. Reconstruction of 3D human body pose from stereo image sequences based on top-down learning, Pattern Recognition,40,2007,3120-3131.
    [135]C. Thurau, V. Hlavfac, Pose primitive based human action recognition in videos or still images, IEEE Conference on Computer Vision and Pattern Recognition,2008,1-8.
    [136]J. Shi and J. Malik. Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence,10,2000,888-905.
    [137]Baxter, J., A bayesian/information theoretic model of learning via multiple task sampling, Machine Learning,28.1997.7-40.
    [138]Wegener, I., The Complexity of Boolean Functions, John Wiley & Sons,1987.
    [139]Hastad, J., Almost optimal lower bounds for small depth circuits, Proceedings of the 18th annual ACM Symposium on Theory of Computing, Berkeley, California, ACM Press,1986,6-20.
    [140]Orponen, P., Computational complexity of neural networks:a survey, Nordic Journal of Computing, 1(1),1994,94-110.
    [141]Hastad, J.,& Goldmann, M., On the power of small-depth threshold circuits, Computational Complexity,1,113-129.
    [142]Bengio, Y,& Le Cun, Y., Scaling learning algorithms towards AI, Large Scale Kernel Machines, MIT Press,2007.
    [143]Utgoff, P.,& Stracuzzi, D., Many-layered Learning, Neural Computation,14,2002,2497-2539.
    [144]LeCun, Y, Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard,W.,& Jackel, L., Backpropagation applied to handwritten zip code recognition, Neural Computation,1(4),1989,541-551.
    [145]LeCun, Y, Bottou, L., Bengio, Y.,& Haffner, P., Gradient based learning applied to document recognition, Proceedings of the IEEE,86(11),1998,2278-2324.
    [146]Simard, P.Y. Steinkraus, D., Platt, J., Best Practices for Convolutional Neural Networks, Proceedings of ICDAR,2003,201-208.
    [147]Ranzato, M., Huang, F., Boureau, Y,& LeCun, Y, Unsupervised learning of invariant feature hierarchies with applications to object recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2007,1-8.
    [148]Dayan, P., Hinton, G., Neal, R.,& Zemel, R., The Helmholtz machine, Neural Computation,7,1995, 889-904.
    [149]Hinton, G., Dayan, P., Frey, B.,& Neal, R., The Wake-sleep Algorithm for Unsupervised Neural Networks, Science,268,1995,1558-1161.
    [150]Saul, L., Jaakkola, T.,& Jordan, M., Mean field theory for sigmoid belief networks, Journal of Artificial Intelligence Research,4,1996,61-76.
    [151]Titov, I.,& Henderson, J., Constituent parsing with incremental sigmoid belief networks, Proc.45th Meeting of Association for Computational Linguistics (ACL 2007), Prague, Czech Republic,2007,1-8.
    [152]Hinton, G. E., and Salakhutdinov, R., Reducing the Dimensionality of Data with Neural Networks, Science,313,2006,504-507.
    [153]Bengio, Y., Simard, P.,& Frasconi, P., Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks,5(2),1994,157-166.
    [154]Lin, T., Home, B., Tino, P.,& Giles, C., Learning long-term dependencies is not as difficult with NARX recurrent neural networks, Tech. rep. UMICAS-TR-95-78, Institute for Advanced Computer Studies, University of Maryland,1995.
    [155]LeCun, Y.,& Huang, F., Loss functions for discriminative training of energy-based models, Proc. of the 10-th International Workshop on Artificial Intelligence and Statistics (AIStats'05),2005,1-8.
    [156]LeCun, Y., Chopra, S., Hadsell, R., Ranzato,M.-A.,& Huang, F.-J., A tutorial on energy-based learning, In Bakir, G., Hofman, T., Scholkopf, B., Smola, A.,& Taskar, B. (Eds.), Predicting Structured Data, MIT Press.2006.40-52.
    [157]Hinton, G.. Training products of experts by minimizing contrastive divergence. Neural Computation, 14,2002.1771-1800.
    [158]Hinton, G., Products of Experts, Proceedings of the Ninth International Conference on Artificia Neural Networks (ICANN).1,1999,1-6.
    [159]Bengio, Y., Ducharme, R.,& Vincent, P., A neural probabilistic language model, Advances in Neural Information Processing Systems,13,2001,933-938.
    [160]Schwenk, H.,& Gauvain, J.-L., Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing, Orlando, Florida,2002,765-768.
    [161]Bengio, Y., Ducharme. R.. Vincent, P.,& Jauvin, C., A neural probabilistic language model, Journal of Machine Learning Research,3,2003,1137-1155.
    [162]Hinton, G., Sejnowski, T.,& Ackley, D., Boltzmann machines:Constraint satisfaction networks that learn, Tech. rep. TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science,1984.
    [163]Ackley, D., Hinton, G.,& Sejnowski, T., A learning algorithm for Boltzmann machines, Cognitive Science,9,1985.41-48.
    [164]Welling, M., Rosen-Zvi, M.,& Hinton, G., Exponential family harmoniums with an application to information retrieval, Advances in Neural Information Processing Systems,17,2005,1-8.
    [165]Andrieu, C., de Freitas, N., Doucet, A.,& Jordan, M., An introduction to MCMC for machine learning, Machine Learning,50,2003,5-43.
    [166]Geman, S.,& Geman, D., Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence,6,1984,72-84.
    [167]Smolensky, P., Information processing in dynamical systems:Foundations of harmony theory, Parallel Distributed Processing, MIT Press, Cambridge,1986,194-281.
    [168]Carreira-Perpi-nan, M.,& Hinton, G., On contrastive divergence learning, Proceedings of the Tenth InternationalWorkshop on Artificial Intelligence and Statistics, Jan 6-8,2005, Savannah Hotel, Barbados,2005,1-8.
    [169]Freund, Y.,& Haussler, D., Unsupervised learning of distributions on binary vectors using two layer networks, Tech. rep. UCSC-CRL-94-25, University of California, Santa Cruz,1994,1-40.
    [170]Le Roux, N.,& Bengio, Y., Representational power of restricted boltzmann machines and deep belief networks, Neural Computation,12,2008,23-34.
    [171]Taylor, G., Hinton, G.,& Roweis, S., Modeling human motion using binary latent variables, Advances in Neural Information Processing Systems,20,2006,623-630.
    [172]Memisevic, R.,& Hinton, G., Unsupervised Learning of Image Transformations, IEEE Conference or, Computer Vision and Pattern Recognition,2007,21-28.
    [173]Kalman, R.E., A New Approach to Linear Filtering and Prediction Problems, Journal of basic Engineering,82(1),1960,35-45.
    [174]Y. Bar-Shalom, X. R. Li and T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software, New York, Wiley,2001,1-8.
    [175]S. Julier and J. Uhlmann, A new extension of the Kalman filter to nonlinear systems, Proceedings of the 11th International Sumposium on Aerospace/Defence Sensing, Simulation and Controls, Orlando, Florida,1997,182-193.
    [176]E.Wan and R.van der Merwe, The Unscented Kalman filter, In:S. Haykin (Eds.), Kalman Filtering and Neural Networks, John Wiley&Sons Inc,2001.
    [177]N. J. Gordon, D. J. Salmond and A. F. M.Smith, Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEE Proceedings of Radar and Signal Processing,140(2),1993,107-113.
    [178]M. Isard and A. Blake, CONDENSATION-conditional Density Propagation for Visual Tracking, International Journal of Computer Vision,29(1),1998,5-28.
    [179]R. Van der Merwe, A. Doucet, N. de Freitas, et al, The unscented particle filter, Technical report CUED/F-INFENG/TR380, Cambridge University Engineering Department, August,2000.
    [180]T. Higuchi, Monte Carlo filter using the genetic algorithm operators, Journal of Statistical Computation and Simulation,59(1),1997,1-23.
    [181]J. S. Liu and R. Chen, Sequential Monte Carlo Methods for Dynamic Systems, Journal of the American Statistical Association,93(443),1998,1032-1044.
    [182]G. Kitagawa, Monte Carlo filter and smoother for non-gaussian nonlinear state space models, Journal of Computational and Graphical Statistics,5,1996,1-25.
    [183]Jiandong T., Jing S., Yandong T. Tricolor Attenuation Model for Shadow Detection. IEEE Transactions on Image Processing,18(10),2009,2355-2363.
    [184]Shehata M. S., Jun C., Badawy W. M., et al. Video-Based Automatic Incident Detection for Smart Roads:The Outdoor Environmental Challenges Regarding False Alarms, IEEE Transactions on Intelligent Transportation Systems,9(2),2008,349-360.
    [185]Bui Tuong P. Illumination for Computer Generated Pictures. Communications of the ACM,18(6),1975 311-317.
    [186]Stander J., Mech R., Ostermann J, Detection of Moving Cast Shadows for Object Segmentation. IEEE Transactions on Multimedia,1(1),1999.65-76.
    [187]Cucchiara R., Grana C., Piccardi M., et al, Improving Shadow Suppression in Moving Object Detection with Hsv Color Information. Proceedings of IEEE Conference on Intelligent Transportatio Systems, ITS 2001,2001,334-339.
    [188]Forsyth D. A, A Novel Algorithm for Color Constancy, International Journal of Computer Vision. 5(1),1990,5-36.
    [189]Z. Lin, M. Chen, L. Wu, and Y. Ma, The augmented lagrange multiplier method for exact recovery of a corrupted low-rank matrix, Preprint,2009.
    [190]Bengio, Y., Learning deep architectures for AI, Foundations and Trends in Machine Learning,2(1). 2009,1-127.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700