用户名: 密码: 验证码:
场景识别中物体空间属性对视图合并机制的影响
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
人类要实现与动态环境的互动,必须具备从新异角度识别物体或场景的能力。研究者们对人类视觉系统的这种能力做了大量研究得出了两种基本的认知理论模型—-结构描述模型(Structural description models)和基于视图的模型(View-based models)。视图合并(view combination)是在基于视图的模型的众多理论中,从计算的角度研究的理论观点之一。它是指:通过合并记忆中表征的物体视图而对新异视图的识别产生概化(generalization)的加工过程。视图合并的观点认为,一个物体的新异视图能够被成功识别的难易程度依赖于它和脑内一系列多种贮存视图的结构相似程度。研究者在物体识别领域已经开展了大量视图合并理论的实证研究,并得出视图合并的强度会受到先前经历的两个视图的相似程度、两视图呈现的时空连续性、遮挡等各种因素的影响的结论。而在场景识别中,有关视图合并的研究则相对缺乏。
     本研究在改进Friedman (2008)年实验范式的基础上,利用虚拟现实技术创设出类似Diwardker实验中真实圆桌场景的高仿真虚拟现实场景考察了经历两角度学习后的视图合并内在加工机制。由于场景中包含了各种复杂信息,我们通过分离场景中不同空间属性(位置信息和特征信息)进一步考察各自对视图合并的影响。
     三个实验均由两个阶段组成:训练阶段和测试阶段。在这两个阶段中,被试的任务均为场景识别任务,即判断屏幕呈现的布局是否为实验默认的正确布局。所不同的是在训练阶段,目标布局或者干扰布局会从两个相隔75°的角度呈现(我们称0°对应的视图和75°对应的视图为训练视图),并且由于被试没有被告知确切的正确场景,因此被试只能通过实验提供的反馈信息猜测学习正确布局。在测试阶段,正确布局或者错误布局从五个相隔37.5°的角度呈现(其中包括0°、75°训练视图,37.5°内插视图、112.5°外插视图、150°远视图),且测试阶段不提供反馈信息。
     实验1主要自变量为测试视图类型(训练视图、内插视图、外插视图、远视图)和干扰布局类型(移动位置、互换位置),因变量为正确反应的反应时和错误率。结果发现在反应时上,被试对内插视图的识别速度快于其他视图,对训练视图的识别速度也快于外插视图和远视图。在错误率上,特征信息条件呈现的行为模式同反应时;在移动位置条件下,内插视图的错误率高于其他视图。移动位置条件下反应时和错误率产生不同模式的原因在讨论中做了详细阐述。
     实验2目的为考察场景中不同空间属性对视图合并加工机制的影响。我们将场景重新设置为两种类型:包含位置信息的场景(相同的杯子组成的不规则布局,对应的干扰布局为移动一个杯子的位置)和包含特征信息的场景(不同杯子组成的不规则布局,对应的干扰布局为互换两个杯子的位置)。主要自变量为测试视图类型(训练视图、内插视图、外插视图、远视图)和场景包含的空间属性类型(位置信息、特征信息)。结果发现在反应时上,两个因素的主效应均显著,交互作用也显著。在位置信息条件下,被试对内插视图的反应时高于训练视图,低于外插视图;而在特征信息条件下,被试对内插视图和训练视图的反应时一样快,均低于外插视图。在本实验中,错误率和反应时的行为模式基本相同。
     实验3进一步考查动态线索对两种空间属性信息视图合并的影响。实验设置同实验2,不同的是在每组训练阶段的开始,场景从一个角度呈现3s后,黑布遮住布局,然后以75°/s的角速度旋转1s到另外一个角度,同时黑布撤走,布局再次呈现3s让被试观察。结果发现,在位置信息条件下,无论在反应时还是错误率上,被试对内插视图的识别成绩同训练视图一样好,并且都好于外插视图和远视图。特征信息条件下的识别模式基本类似于位置信息条件。
     本研究的三个实验结果均支持了场景识别的视图合并机制观点。为了深入了解视图合并内在机制的特点,我们引入了空间属性、运动线索两个因素,通过将场景包含的复杂信息类别化、单一化,细致考察空间属性对视图合并的影响方式。结果发现,视图合并在特征信息条件下的视图合并效果普遍好于位置信息条件下的视图合并效果。通过在训练初始阶段加入一段动态效果,结果表明动态旋转过程能在一定程度上提高了场景表征的整体抽象水平,易化了特征信息的视图合并加工,而对位置信息没有明显影响。这说明两种空间属性信息对视图合并的影响方式存在本质差异。
To interact with a dynamic environment, humans alike must be capable of accurately and quickly recognizing objects or scenes despite changes in view that occur through movement of the object or of the observer. Research on the apparent ability of human vision has been largely driven by two classes of cognitive models—Structural description models and View-based models. One of the theories in View-based model elaborating the object recognition from a computational perspective known as view combination suggests that generalization to novel views of an object is accomplished by combining multiple object views represented in memory. That is, the extent to which a novel view of an object can be readily recognized depends on its degree of structural similarity to a set of multiple stored views. Although there have been many empirical studies on the view combination mechanisms of objects recognition in the past leading us to know some factors such as similarity and temporal-spatial continuity between two studied view of object could affect the view combination processing, only a few have investigated how views of scenes are combined across different viewpoints and how the system uses this information in scene recognition.
     In the present study, based on the improved Friedman (2008)'s paradigm for view combination, we created a virtual-reality setting much like Diwardker's real table layout to explore the view combination mechanism. Considering there are various features intermingled in our surrounding environment, we made a further discrimination in the effect of location information and identity information on view combination in scene recognition.
     Any of the 3 experiments we reported here consisted of two phases:training phase and testing phase. Participants were allowed to complete a scene recognition task regardless of which perspective the scene was taken from. The virtual table layout was presented from two ground-level perspectives 75°apart in the training phase(e.g.0°training view and 75°training view), and three other perspectives in the testing phase(e.g. interpolated view-one that was between the two training perspectives i.e.37.5°view, extrapolated view-one that was outside of the training range i.e. 112.5°view, and far view-one that was far from the training views i.e.150°view. Both the interpolated view and the extrapolated view were equidistant from one of the training views,) Participants received feedback message in training phase but none in testing phase.
     In expl,the independent variable of interest was testing view type(training view, interpolated view, extrapolated view and far view) and distractor type(move, switch).The result of experiment 1 showed us subjects recognized interpolated views as fast as the trained views. Both interpolated view and training view were recognized faster than extrapolated view, although extrapolated view and interpolated view were both equidistant from the training view. The same pattern was replicated in error rate in the move condition rather than in the switch condition. The reason for the difference was elaborated intensively in the discussion.
     In exp2,in order to explore the effect of different spatial properties on the view combination mechanisms, we created two sorts of scenes composed of different spatial information. One sort of two layouts consisted of identical cups arranged irregularly on the virtual desktop, and the corresponding distractor of which is the layout with one object moved to other position. The other sort of two layouts consisted of different cups (of different color, texture, or shape), and the corresponding distractor of which is the layout with two objects switched. The independent variable of interest is also testing view type(training view, interpolated view, extrapolated view and far view) and spatial properties information contained in the scene (location information, identity information). According to the result of Exp2, subjects recognized interpolated views more slowly than the trained views, but faster than extrapolated views in location information condition. In the identity information condition, subjects recognized interpolated views as well as the trained views, but faster than extrapolated views. The same pattern was replicated in error rate in either condition. Until now, we could make the conclusion that view combination effect occurred in either condition. Moreover, the view combination effect in identity information condition is stronger than that in location information condition.
     In exp3, we brought in the dynamic cues. Other things being equal, we changed the presentation of the layout to examine how subjects use the dynamic cues in experiment. The layout was presented from one perspective for 3 seconds firstly, and then the layout was covered by a black cloth, and in the same time, the virtual camera rotated around the center of the layout for 1s in angular speed of 75°/s. When the camera achieved in the second viewpoint, the layout was presented for 3s once again with the black cloth removed. As a result, in either condition, subjects performed the same way as in experiment 2.Moreover, in identity information condition, subjects recognized interpolated views as well as the trained views, far faster than extrapolated views which seemed that subjects performed much better than did in experiment 2. The result of experiment 3 showed that motion cues did facilitate the view combination effect in identity information condition; in contrast, it seemed that motion cues did not have any facilitation effect in location information condition.
     In conclusion, this finding of our experiments provided more evidence for view combination in scene recognition, and shed light on how location or identity information influences the view combination process in scene recognition individually. Also, our result brought us more consideration about the contribution of dynamic cues to the effects of spatial properties on view combination.
引文
黄希庭.(2004).简明心理学辞典.合肥:安徽人民出版社
    林崇德,杨志良,黄希庭.(2003)心理学大辞典.上海:上海教育出版社
    Burgess, N. (2002). The hippocampus, space, and viewpoints in episodic memory
    Quarterly Journal of Experimental Psychology Section a-Human Experimental Psychology,55(4),1057-1080.
    Castelhano, M. S., Pollatsek, A.,& Rayner, K. (2009). Integration of multiple views of scenes. Attention Perception & Psychophysics,77(3),490-502.
    Christou, C. G.,& Bulthoff, H. H. (1999). View dependence in scene recognition after active learning. Memory & Cognition,27(6),996-1007.
    Diwadkar, V.,& McNamara, T. (1997). Viewpoint dependence in scene recognition. Psychological Science,8(4),302-307.
    Edelman, S.,& Bulthoff, H. (1992).Orientation dependence in the recognition of familiar and novel views of three-dimensional objects. vision research-Oxford-,32, 2385-2385.
    Finke, K., Bublak, P.,& Zihl, J. (2006). Visual spatial and visual pattern working memory:Neuropsychological evidence for a differential role of left and right dorsal visual brain. Neuropsychologia,44(4),649-661.
    Friedman, A., Spetch, M. L.,& Ferrey, A. (2005). Recognition by humans and pigeons of novel views of 3-D objects and their photographs. Journal of Experimental Psychology-General,134(2),149-162.
    Friedman, A., Vuong, Q. C.,& Spetch, M. L. (2009).View combination in moving objects:The role of motion in discriminating between novel views of similar and distinctive objects by humans and pigeons. Vision Research,49(6),594-607.
    Friedman, A.,& Waller, D, (2008). View combination in scene recognition. Memory & Cognition,36(3),467-478,
    Marr, D. (1982). Vision. San Francisco:Freeman.
    McNamara, T. P., Diwadkar, V. A., Blevins, W. A.,& Valiquette, C. M. (2006). Representations of apparent rotation. Visual Cognition,13(3),273-307.
    Mou, W., Zhao, M.,& McNamara, T. (2007). Layout geometry in the selection of intrinsic frames of reference from multiple viewpoints. Learning, Memory,33(1), 145-154.
    Mou, W. M., McNamara, T. P., Valiquette,C. M.,& Rump, B. (2004). Allocentric and egocentric updating of spatial memories. Journal of Experimental Psychology-Learning Memory and Cognition,50(1),142-157.
    O'Keefe, J.,& Burgess, N.(1996). Geometric determinants of the place fields of hippocampal neurons. Nature; 381 (6581),425-428.
    Owen, A. M., Milner, B., Petrides, M.,& Evans, A. C. (1996). Memory for object features versus memory for object location:A positron-emission tomography study of encoding and retrieval processes. Proceedings of the National Academy of Sciences of the United States of America,93(17),9212-9217.
    Perrett, D. I., Oram, M. W.,& Ashbridge, E. (1998). Evidence accumulation in cell populations responsive to faces:an account of generalisation of recognition without mental transformations. Cognition,67(1-2),111-145.
    Poggio, T.,& Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature,343(6255),263-266.
    Poggio, T.,& Girosi, F. (1989). A Theory of Networks for Approximation and Learning: Massachusetts Institute of Technology.
    Postle, B. R., D'Esposito, M.,& Corkin, S. (2005). Effects of verbal and nonverbal interference on spatial and object visual working memory. Memory & Cognition,33(2), 203-212.
    Schwoebel, J.,& Srinivas, K. (2000). Recognizing objects seen from novel viewpoints: Effects of view similarity and time. Journal of Experimental Psychology-Learning Memory and Cognition,26(4),915-928.
    Shelton, A.,& McNamara, T. (1997), Multiple views of spatial memory. Psychonomic Bulletin and Review,4,102-106.
    Shelton, A. L.,& McNamara, T. P. (2001). Systems of spatial reference in human memory. Cognitive Psychology,43(4),274-310.
    Shepard, R. N.,& Metzler, J. (1971). MENTAL ROTATION OF 3-DIMENSIONAL OBJECTS. Science,171(3972),701-&.
    Simons, D. J. (1996). Accurate visual detection of layout changes requires a stable observer position.. Investigative Ophthalmology and Visual Science,37(3),519s.
    Spetch, M. L.,& Friedman, A. (2003). Recognizing rotated views of objects: Interpolation versus generalization by humans and pigeons. Psychonomic Bulletin & Review,10(1),135-140.
    Spetch, M. L., Friedman, A.,& Reid,S.L. (2001). The effect of distinctive parts on recognition of depth-rotated objects by pigeons (Columba livia) and humans. Journal of Experimental Psychology-General,130(2),238-255.
    Srinivas, K.,& Schwoebel, J. (1998). Generalization to. novel views from view combination. Memory & Cognition,26(4),768-779.
    Tarr, M. J. (1995). Rotating objects to recognize them:A case study on the role of viewer dependency in the recognition of three-dimensional objects. psychonomic bulletin & review,2,55-82.
    Tarr, M. J.,& Pinker, S. (1989). MENTAL ROTATION AND ORIENTATION-DEPENDENCE IN SHAPE-RECOGNITION. Cognitive Psychology, 21(2),233-282.
    Ullman, S.,& Basri, R. (1991). Recognition by Linear Combinations of Models. IEEE Trans. Pattern Anal. Mach. Intell.,13(10),992-1006.
    Valiquette, C.,& McNamara, T. P. (2007). Different mental representations for place recognition and goal localization. Psychonomic Bulletin & Review,14,676-680.
    Waller, D., Friedman, A., Hodgson, E.,& Greenauer, N. (2009). Learning scenes from multiple views:Novel views can be recognized more efficiently than learned views. Memory & Cognition,37(1),90-99.
    Wilson, F., Scalaidhe, S.,& Goldman-Rakic, P. (1993). Dissociation of object and spatial processing domains in primate prefrontal cortex. Science,260(5116),1955-1958.
    Wong, A. C. N.,& Hayward, W. G. (2005). Constraints on view combination:Effects of self-occlusion and differences among familiar and novel views. Journal of Experimental Psychology-Human Perception and Performance,31(1),110-121.
    Xu, Y. D.,& Chun, M. M. (2006). Dissociable neural mechanisms supporting visual short-term memory for objects. Nature,440(7080),91-95.
    Yantis, S.,& Nakama, T. (1998). Visual interactions in the path of apparent motion. Nature Neuroscience,1(6),508-512.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700