基于视觉听觉语义相干性的强化学习系统的研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于视觉听觉语义相干性的强化学习系统的研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Reinforcement Learning System Based on Audio-Visual Semantic Coherence
作者：李誌
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：Agent ; 多Agent系统 ; 强化学习 ; 分层强化学习 ; 语义相干性 ; 个性化模型 ; 煤矿救援
英文关键词：Agent ; Multi-Agent system ; Reinforcement Learning
英文关键词：(RL) ; Hierarchical Reinforcement Learning (HRL) ; Semantic Coherence ; User Personality Model ; Mine Accident Rescue
学位年度：2012
导师：余雪丽
学科代码：081203
学位授予单位：太原理工大学
论文提交日期：2012-05-01

摘要

学习支持系统是近年来人工智能在教育中应用的研究热点之一,它是教育学、认知科学和计算机科学的交叉研究领域。多Agent系统的理论与技术,为分布式开放系统的分析、设计和实现提供了一条崭新的途径,因此也被广泛应用于学习支持系统。随着计算机与网络的发展,人们日渐依赖于通过网络进行交互,人们的学习方式也发生了变化,这就要求学习支持系统相应改变,从而对实现它的技术有了更高要求。
     本文从分析新形势下人类教育理念的转变和新学习方式入手,对多Agent强化学习系统展开研究,包括单Agent的强化学习算法和多Agent强化学习算法,并对实际应用所需要的用户个性化描述、个性化学习内容呈现等问题进行了探索,主要完成了以下几项研究工作：
     (1)以历史发展为线索总结了计算机辅助教学的智能化历程,分析了智能教学系统的优势与不足,认为适应性学习支持系统是当前数字化学习支持平台的发展趋势。
     (2)在对经典强化学习算法TD算法和Q学习算法进行深入研究的基础上,提出了一种基于偏向信息的强化学习算法,利用预置的偏向信息或先验知识来指导Agent行为选择策略,引导Agent探测状态空间的方向,同时在学习的过程中,不断修改、完善已有的知识,达到提高算法收敛速度的目的。
     (3)提出了适合于连续状态空间下的多Agent分层强化学习的半马氏博弈模型SMG,同时给出了此模型对应的MAHRL协同框架,分别对协作子任务和非协作子任务进行形式化描述,阐述了多Agent分层强化学习系统的工作流程并给出了MAHRL(?)(?)同框架的核心-基于Pareto(?)与优解的分层强化学习算法。仿真实验验证了文中所提到的SMG模型、MAHRL(?)办同框架和基于Pareto占优解的分层强化学习算法的有效性和优越性。
     (4)根据卡特尔16项个性因素测验法,提出通过心理测验获得受训者量化的关键个性属性值的算法。
     (5)提出了恐怖场景个性化呈现算法,结合本文提出的强化学习算法实现了技能学习,可根据受训者的知识技能掌握情况调整操作难度和知识测试点,既防止难度过大使受训者迷茫而失去信心,又避免枯燥重复的操作和测试使其厌烦而失去兴趣。
     最后,本文以煤矿救援培训为实例实现了基于视觉听觉语义相干性的训练系统原型,该系统可以根据用户的个性化特征信息,获取与之相适应的素材,组合在一起提供给用户,实现了个性化培训。
Adaptive learning support system has been the focus topic in the research field of artificial intelligence in education in recent years, which is a cross areas of education, cognitive science and computer science. The theory and techniques in Multi-agent system can be used as a novel method to analyze, design and implement distributed open system, so it is also applied in learning support system. With the rapid development of computer and network technology, people rely more and more on network in communication. This causes the change of learning style. The result is that the learning support system has to be changed and require higher demands on its implemental techniques.
     In this paper, we start our research on Multi-agent reinforcement learning system, includes the reinforcement learning algorithms of Agent and Multi-agent, from analyzing the change of learning style. The paper also focuses on the critical application technologies such as user profile, personalized learning environment. The paper has completed the following tasks:
     (1) We summarize the being intelligent process of computer-aided instruction based on literature reviews and analyze the strengths and weaknesses of intelligent tutoring system, and hold that adaptive learning support system is the current trend of the e-learning platform.
     (2)The biasing reinforcement learning algorithm is presented based on detailed analysis of TD algorithm and Q-learning algorithm. The bias information is incorporated to boost learning process with priori knowledge to affect the action selection strategies in reinforcement learning. The error in priori knowledge has been modified during the learning process and the learning speed is also accelerated.
     (3) The Semi-Markov Game Model is presented which can express the hierarchical learning tasks of Multi-Agent system effectively and temporal and sequence characteristic of joint action. This kind of model can be used to Multi-Agent hierarchical reinforcement learning on the continuous state space. Then the paper gives the collaborative framework of MAHRL based on SMG model. This framework describes the collaborative and non-collaborative tasks among agents respectively, and elaborates the work flow of MAHRL system. Finally, the paper gives the HRL algorithm based on Pareto optimal solutions. And this algorithm is the kernel of the collaborative framework of MAHRL. The experiment testifies the validity and superiority of those kinds of model, framework and algorithm.
     (4) An algorithm is presented based on the sixteen personality factor questionnaire to obtain the key personality value of trainee.
     (5) Personalized rendering algorithm for terrorist scene is also presented. Combining with the reinforcement learning algorithm presented in this paper, it can be used to adjust the difficulty level of the practices.
     Finally, a prototype of system on the mine accident rescue training is realized. The system can obtains the user's personality data, retrieve the matching knowledge according the related rules, and then provide personalized learning environment to users.

引文

[1]Guo J., Guo A. Crossmodal Interactions between Olfactory and Visual Learning in Drosophila [J]. Science (2005) 309:307-310.
    [2]Tom M.Mitchell. AI and the Impending Revolution in Brain Sciences [C]. AAAI Presidential Address, Eighteenth National Conference on Artificial Intelligence, August,2002.
    [3]张有为等.人机自然交互[M].国防工业出版社,2004年9月.
    [4]陈仕品.适应性学习支持系统的学生模型研究[D].西南大学,博士学位论文,2009.
    [5]Bruner, J. Toward a Theory of Instruction [M]. Cambridge, MA:Harvard University Press, 1966.
    [6]Bruner, J. Going beyond the Information Given [M]. New York:Norton,1973.
    [7]Bruner, J. Acts of Meaning [M]. Cambridge, MA:Harvard University Press,1990.
    [8]Mayer R. E. Multimedia Learning [M]. New York:Cambridge University Press,2001:59.
    [9]Paivio A. Mental Representation:A Dual Coding Approach [M].New York:Oxford University Press,1986:53.
    [10]M. O. Belardinelli, etc., Audio-visual crossmodal interactions in environmental perception: an fMRI investigation [J]. Cognitive Processing,2004,5(3):167-174.
    [11]Giraffe L., Vieeari R. The Use of Agents Techniques on Intelligent Tutoring Systems, In: International Conference of The Chilean Computer Seience Soeiety,18,1998.
    [12]刘贵全.多Agent协作规划理论与应用的研究[D].中国科学技术大学.博士学位论文,1999.
    [13]Nicholas R Jennings. An agent-based appoarch for building complex software systems [J]. Communications of ACM,2001,44(4):35-41.
    [14]Francisco J Garijo, Magnus Boman eds, Multi-Agent System Engineering[C], Proceedings of the 9th European Workshop on Modelling Autonomous Agents in a Multi-Agent World. Heidelberg, Germany Springer-Verlag,1999:1-7.
    [15]杨鲲,翟永顺,刘大有Agent特性与分类[J].计算机科学,1999,26(9)：30-34.
    [16]梁义芝,刘云飞.基于Multi-agent技术的决策支持系统[J].计算机科学,1999,26(8)：50-52.
    [17]Wooldridge M, Jennings N. Intelligent agents:theory and practice[J]. Knowledge Engineering Review,1995,10(2):115-152.
    [18]M.E.Bratman.Intentions, Plans, andPraetiealReasoning, Harvard University Press, Cambridge, MA,1987.
    [19]M.E.Bratman, D.Israel, M.E.Pollaek.Plansandresouree—boundedPraetiealreasoning. ComPutationallntelligenee,1988,4(4):349-355.
    [20]Stuart Russell, Peter Norvig. Artificial Intelligence:A Modern Approach.Prentice Hall,1995.
    [21]Hewitt C. The Challenge of Open Systems. Byte,1985,10(4),223-242.
    [22]Bratman M.E, Israel D.J., and Pollack M.E., Plans and Resource-bounded Practical Reasoning. Computational Intelligence,1988:349-355.
    [23]Jennings N.R., Specification and Implementation of a Belief Desire Joint-intention Architecture for Collaborative Problem Solving. Journal of Intelligent and Cooperative Information Systems,1993,2(3):289-318.
    [24]R. Davis, R. G. Smith. Negotiation as a Metaphor for Distributed Problem Solving. Artificial Intelligence,1983(20):63-109.
    [25]R. Smith. The contract net Protocol:High-level communication and control in distributed Problem solver. IEEE Transactions on Computers 29(12):1104-1113,1980.
    [26]T. Sandholm. Distributed rational decision making. In Weiss, G., ed., Multi-agent Systems. Cambridge, Massachusetts, The MIT Press.201-259,1999.
    [27]T. Sandholm, V. Lesser. Issues in Automated Negotiation and Electronic Commerce: Extending the Contract Net Framework. First International Conference on Multi-agent Systems, San Francisco,1995.
    [28]Y.U.Ryu. Relatives Deontic Modalities for Contractual Obligations in Formal Business Communication.proc.13th Hawaii Int. Conf. on Systems Sci.,4:485-493,1997.
    [29]K. Sycara, D. Zeng. Coordination of Multiple Intelligent Software Agent. The International Journal of Cooperative Information Systems,1996.
    [30]Hector J Levesque, Philip R Cohen, Jose H T Nunes. On acting together [A]. In:Swartout Dietterich eds, Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90)[C]. Boston, Massachusetts:AAAI Press,1990:94-99.
    [31]Barbara J Grosz, Sarit Kraus. Collaborative plans for complex group action [J], Artificial Intelligence,86(2):269-357,1996.
    [32]Barbara J Grosz, Sarit Kraus. The evolution of Shared Plans[A]. In:Woolridge M, Rao A eds, Foundations and Theories of Rational Agency[C]. New York:Springer,1998.227-262.
    [33]M. Huhns, L. Stephens. Multi-agent systems and societies of agents. In Weiss, G·, ed. Multi-agent Systems. Cambridge, Massachusetts:The MIT Press:79-121,1999.
    [34]G. Hinton, J. Sejnowski. Unsupervised Learning and Map Formation:Foundations of Neural Computation [M]. MIT Press,1999.
    [35]Minsky M L. Theory of neural analog reinforcement systems and its application to the brain model problem [D]. New Jersey, USA:Princeton University,1954.
    [36]Dynamic programming. http://en.wikipedia.org/wiki/Dynamic_programming
    [38]Markov Decision Processes and Bellman Equations, http://www.amath.washington.edu /courses/579-spring-2010/MDP.pdf
    [39]Ronald A. Howard. Dynamic Programming and Markov Processes. Technology Press of Massachusetts Institute of Technology,1960.
    [40]Richard S. Sutton, Andrew G. Barto. Reinforcement Learning:An Introduction. MIT Press, Cambridge, MA,1998.
    [41]Christopher J.C.H. Watkins, Peter Dayan. Technical Note:Q-Learning. Machine Learning,Vol.8,1992:279-292.
    [42]Szepesvari C. The asymptotic convergence-rate of Q-learning [C]. Proeeedings of Neural Information Processing Systems(NIPS).Cambridge, MA,1997:1064-107.
    [43]Peng J.Williams R j.Incremental multi-step Q-learning [J]. Machine Learning,1996,22(4): 283-290.
    [44]SARSA. http://en.wikipedia.org/wiki/SARSA
    [45]Leslie Pack Kaelbling, Michael L. Littman,Andrew W. Moore. Reinforcement Learning:A Survey. Journal of Artificial Intelligence Research,1996,237-285.
    [46]Peter Dayan, Christopher JCH Watkins. Reinforcement Learning. Encyclopedia of Cognitive Science London, England:MacMillan Press,2001.
    [47]张汝波,顾国吕,刘照德,王醒策.强化学习理论、算法及应用.控制理论与应用.Vol.17,No.5,2000,637-642.
    [48]高阳,陈世福,陆鑫.强化学习研究综述.自动化学报,2004,30(1)：86-100.
    [49]Andrew W. Moore. Reinforcement Learning. Carnegie Mellon University Tutorial Slides, http://www.autonlab.org/tutorials/
    [50]Christopher J.C.H. Watkins. Learning from Delayed Rewards. King's College, PhD Thesis, May 1989.
    [51]Leslie Pack Kaelbling, Miehaol L. Littman and Andrew W. Moore. Reinofrcement Lenaring: A Survey. Journal of Artificial Intelligence Research,1996, vol.4:237-285.
    [52]R.Sutton, A.Barto. Reinforcement Learning:An Introduction [M]. Bradford Books, MIT, 1998.
    [53]A.Sehwartz. A reinforcement learning method for maximizing undiscounted rewards [C]. In: Proceedings of the 10th International Conference on Machine Learning.1993:298-305.
    [54]余祥官,崔国华,邹海明.计算计算法基础[M].华中科技大学出版社,2006
    [55]C. Watkins, P. Dayan. Q-learning [J]. Machine Learning,1992,8(3):279-292
    [56]Sutton R.S. Learning to Predict by the methods of temporal differences [J]. Machine Learning. 1988,3:9-44.
    [57]A. Barto, R. Sutton, C. W. Anderson. Neuron-like adaptive elements that can solve difficult learning control Problems. IEEE Transactions on System,1983,13(5):834-846.
    [58]A. Samuel, Some studies in machine learning using the game of checkers [J]. IBM Journal on Research and Development,1959:210-229.
    [59]A. Samuel, Some studies in machine learning using the game of checkers II [J]. IBM Journal on Research and Development,1967:601-617.
    [60]A. Klopf. Brain function and adaptive systems-A heterostatic theory [R]. Technical Report AFCRL-72-0164, Air Force Cambridge Research Laboratories, Bedford, MA.1974.
    [61]S. Singh, R. Sutton. Reinforcement learning with replacing eligibility traces [J]. Machine Learning,1996,22:123-158.
    [62]B. F. Skinner. The Behavior of Organisms:An Experimental Analysis [M]. Prentice Hall, Englewood Cliffs, New Jersey,1938.
    [63]B. F. Skinner. Science and human behavior [M]. Collier Macmillian, New York,1953.
    [64]Adam Laud, Gerald De Jong. Reinforcement Learning and Shaping:Encouraging intended behaviors [C]. In:Proceedings of the Nineteenth International Conference on Machine Learning,2002:355-362.
    [65]Andrew Y. Ng, Daishi Harada, Stuart Russell. Policy invariance under reward transformations:theory and application to reward shaping [C]. In:Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia,1999:278-287.
    [66]Maja J Mataric. Reward functions for accelerated learning[C], In:Proceedings of the Eleventh International Conference on Machine Learning,1994:181-189.
    [67]Millan R, Posenato D, Dedieu E. Continuous-Action Q-learning [J]. Machine Learning, 2002(49):247-265.
    [68]Shapiro D. Value-driven agents [D]. Ph.D. thesis, Stanford University,2001
    [69]陈圣磊,吴慧中,肖亮,等.基于Metropolis准则的多步Q学习算法与性能仿真[J].系统仿真学报,2007,19(6)：1284-1287.
    [70]R. A. C. Bianchi, C. H. C. Ribeiro, A. H. R. Costa. Heuristically Accelerated Q-Learning:a new approach to speed up reinforcement learning [J]. Lecture Notes in Artificial Intelligence, 2004,31(71):245-254.
    [71]Luiz A. Celiberto Jr., Carlos H. C. Ribeiro, Anna Helena Reali Costa, et al. Heuristic Reinforcement Learning applied to RoboCup Simulation Agents[A]. In:RoboCup International Symposium,2007, Atlanta. Proceedings of RoboCup 2007 [C]. Springer-Verlag Berlin, Heidelberg,2008:220-227.
    [72]Reinaldo A. C. Bianchi, Carlos H. C. Ribeiro. Anna H. R. Costa. Heuristic Selection of Actions in Multi-agent Reinforcement Learning [A]. The support of FAPESP (Proc. 2006/05667-0) [C].2006:21-26.
    [73]Matthew E. Taylor, Peter Stone. Cross-Domain Transfer for Reinforcement Learning [A]. Appearing in Proceedings of the 24th International Conference on Machine Learning [C]. Corvallis, OR, ACM New York, NY, USA,2007:879-886.
    [74]Hailu G, Sommer G. On amount and quality of bias in reinforcement learning [C]. In:Proc of IEEE SMC'99,1999:1491-1495.
    [75]Iglesias Roberto, Regueiro Carlos V. Supervised reinforcement learning:Application to a wall following behaviour in a mobile robot [C]. In:Proc of IEA/AIE. Springer,1998: 300-309.
    [76]Moreno D L, Regueiro C V, Iglesias R, etal. Using priori knowledge to improve reinforcement learning in mobile robotics [C]. In:Proc of TAROS 2004. Springer,2004: 1744-8050.
    [77]Mataric M J. Getting humanoids to move and imitate [J]. IEEE Intelligent Systems, 2000(7):18-24.
    [78]Malak, R. J.,& Kholsa, P. K.. A framework for the adaptive transfer of robot skill knowledge among reinforcement learning agents. Robotic Automation, IEEE International Conference, 2001.
    [79]李学勇,欧阳柳波,李国徽.基于隐偏向信息学习的强化学习算法[J].南华大学学报(理工版),2004,18(2)：10-16.
    [80]林芬,石川,罗杰文,史忠植.基于偏向信息学习的双层强化学习算法[J].计算机研究与发展.2008,45(9)：1455-1462.
    [81]赵志宏,高阳,骆斌,陈世福.多Agent系统中强化学习的研究现状和发展趋势.计算机科学,Vol.31, No.3,2004:23-27.
    [82]鲍翊平.多Agent协作团队的强化学习方法研究.硕士学位论文,国防科学技术大学,2005
    [83]任晓明,谷飙.博弈论的逻辑和认知基础.西南大学学报(社会科学版),Vol.33, No.3, 2007:102-106.
    [84]Dahl F A. The Lagging Anchor Algorithm:Reinforcement Learning in Tow-Players Zero-sum Games with Imperfect Information. Machine Learning, Vol.49,2002:5-37.
    [85]Chalkiadakis G. Multi-agent Reinforcement Learning:Stochastic Games with Multiple Learning Players. Department of Computer Science, University of Toronto, Technical report, 2003.3.
    [86]Bowling M H, Veloso M M. Multiagent Learning Using a Variable Learning Rate. Artificial Intelligence, Vol.136, No.2,2002:215-250.
    [87]宋梅萍,顾国吕,张国印.随机博弈框架下的多Agent强化学习方法综述.控制与决策,Vol.20, No.10,2005:1081-1090.
    [88]程德华.基于多智能体的博弈仿真应用研究.华中科技大学,硕十学位论文,2005.
    [92]王长缨,陈文伟,姚莉.一种基于团队马尔可夫博弈的多Agent协同强化学习算法.复旦学报(自然科学版),Vol.43, No.5,2004:110-116.
    [93]Mohammad Ghavamzadeh. Hierarchical reinforcement learning in continuous state and multi-agent environments.
    [94]Neville Mehta, Prasad Tadepalli, Alan Fern. MASH:A Scalable Multi-Agent Framework for Hierarchical Reinforcement Learning.
    [95]Felix Fischer, Michael Rovatsos, Gerhard Weiss. Hierarchical Reinforcement Learning in Communication-Mediated Multiagent Coordination.
    [96]Toshihiko Watanabe. Hierarchical reinforcement learning using a modular fuzzy model for multi agent problem.
    [97]苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options(?)算法的研究[J].模式识别与人工智能,2005,8(6)：679-684.
    [98]沈晶.分层强化学习方法研究[D].博士学位论文,哈尔滨工程大学,2006.
    [99]寇光兴,李炳杰,赵惠文.多人微分对策Pareto最优解的最优均衡值算法.空军工程大学学报,Vol.7, No.3,2006:82-84.
    [100]王能发,赵薇.多目标博弈的弱Pareto_Nash平衡点的存在性.云南民族大学学报,Vol.18, No.2,2009:109-114.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700