基于强化学习的多机器人编队方法研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

基于强化学习的多机器人编队方法研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Research on Dynamic Team Formation for Multirobots Based on Reinforcement Learning
作者：王醒策
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：多机器人 ; 编队 ; 分布式控制 ; 强化学习 ; 行为控制
英文关键词：Multirobots ; Team formation ; Distributed control ; Reinforcement Learning ; Behavior control
学位年度：2002
导师：张汝波
学科代码：081203
学位授予单位：哈尔滨工程大学
论文提交日期：2002-01-01

摘要

本文所阐述的内容是在二维有障空间机器人动态编队的方法，分别就以下几方面的问题进行了研究和探讨：
     首先是关于多机器人进行协作的体系结构的研究。在论文中结合了分散式控制和集中式控制的优点，应用了分布式控制。控制的结构是自上而下的，共分为三层结构。
     然后对各个层次的规划方法进行了研究：第一层是任务级规划。为了节省时间和空间，应用势场栅格法来解决多机器人编队的全局路径规划问题。第二层实现的是行为级规划。在这里编队的思想主要是应用Brooks的行为思想，把整个任务分解为若干个行为。编队的方法主要应用的是强化学习的方法，通过学习可以使机器人在不同的环境中采取合适行为。内外两部分强化信号被用来兼顾机器人在编队中的个体利益以及队伍的整体利益。应用学习的方法，多机器人的自适应以及自主性都得到了体现，同时体现了机器人内部的协调性。最后一层实现的是动作层规划，应用模糊控制的方法实现对不同动作的选取。
     自上而下的各层中采用了不同的实现方法，充分地体现了机器人的智能。在仿真试验中进一步验证了所有方法的可行性和有效性。
The content elaborated in the thesis is the technique of multirobots' team formation in the field with obstacles. The following aspects are investigated and discussed.
    Firstly it is the study of the architecture structure of the multirobots' cooperation. The system combines the advantage of the central and the dispersed architecture and uses the distributed architecture. With this multi-layer control , the task of every part is clear. And it is also a multi-level architecture, having three levels from above to the below.
    Then the detailed study of each level is illustrated here. The first level is the task planning level. In order to save the time and the memory, the method of potential and the grid is used to solve the global path planning problem. The second level is the planning of the robot's behavior. Here the thought of the Brooks behavior is used to cut the team formation into several behaviors. And the way to choose the adapted behavior in different environment is mainly the Reinforcement Learning, hi the Reinforcement Learning algorithm is explained here , the inside reinforcement signals and outside reinforcement signals are applied to show the interests of the robot and its whole group. Using this way ,the self-adaptivity ,self-determination and the cooperation of the robots are all embodied clearly. The last level is the action planning, with the fuzzy way choosing the different action.
    hi the simulation of the experiment , the feasibility of the technology is verified further.

引文

[1] 蒋新松．机器人学导论．辽宁科学技术出版社．1994：511-516页，543-554页
    [2] 周远清等．智能机器人系统．清华大学出版社．1989：1-5页
    [3] 高志军，颜国正．多机器人协调与合作系统的研究现状和发展．光学精密工程，2001(4)：99-103页
    [4] koivo A J. Bekey G A. Report of workshop on coordinated multiple robot: planning,control and application[J].Robotic and Automation. 1998, 14(1): 91-93 P
    [5] Kazerooni H. Compliance control and unstructured modeling of cooperation robots.Proc.1998 IEEE Int. Conf. on Robotics and Automation. Philadelphia. 1998:38-56P
    [6] 王越超，谈大龙．协作机器人学的研究现状与发展．机器人．1998(1)：69-75页
    [7] Minsky M.L. Theory of neural analog reinforcement system and its application to the brain model problem [D]. New Jersey, USA. 1954
    [8] Waltz M D &Fu KS. A heuristic approach to reinforcement learning control systems. IEEE Trans. Automatic Control, 1956(3):390-398P
    [9] Sutton R S. Temporal credited assignment in reinforcement learning. Amherst, MA:University of Massachusetts, 1984
    [10] Sutton R S. Learning to predict by the methods of the temporal difference.Machine learning, 1988:9-44P
    [11] Winfried Ilg and Karsten Berns.A learning architecture based on for adaptive control of the walking machine LAURON. Robotics and Autonomous system, 1995(15):323-334P
    [12] Watkins J C H and Dayan P. Q-learning. Machine learning 1992(8)：279-294P


    [13]Pushkar P and Abdul S. Reinforcement learning of iterative behavior with multiple sensors. Journal of Applied Intelligent. 1994(4): 381-365P
    [14]Singh S P. Reinforcement learning algorithms for average payoff Markovian decision processes. Proceeding 12th National Conference on Artificial Intelligence. Menlo Park,CA, USA:AAAI Press. 1994:202-207P
    [15]Singh S P. Reinforcement learning with replacing eligibility traces. Machine Learning. 1996(22): 159-195P
    [16]Sebastain T and Karsren Berns.Life long robot learning. Robotics and Autonomous System. 1995(15):25-46P
    [17]阎平凡．再励学习——原理、算法及其在智能控制中的应用．信息与控制．1996(1)：28-34页
    [18]马莉，蔡自兴，再励学习控制器结构与算法．模式识别与人工智能．1998(1)：96-100页
    [19]张汝波，周宁，顾国昌，张国印．基于强化学习的智能机器人的避碰方法研究．机器人．1999(3)：204-209页
    [20]张汝波．强化学习研究及其在AUV导航系统中的应用．哈尔滨工程大学博士论文，1999
    [21]蒋国飞，吴沧浦．基于Q学习算法和BP神经网络的倒立摆控制．自动化学报．1998(2)：662-666页
    [22]蒋国飞，高慧琪，吴沧浦．Q学习算法的中网格离散化方法的收敛性分析．控制理论与应用．1999(2)：194-198页
    [23]Leslie Laelbling, Michael L.Littman. Reinforcement Learning:A survey. Journal of Artificial Intelligence Research. 1996(4): 237-285P
    [24]Filar J. and Vrieze. K. Competitive Markov Decision Process. Spring-Verlag. 1997
    [25]Junling Hu. Learning In Markov Game With Incomplete Information. Artificial Intelligence 1998
    [26]Ishibuchi Hisao. Fuzzy Q_learning for a multi-player non-cooperative repeated game. In Proceedings of the 6th IEEE International Conference on Fuzzy Systems,FUZZ-IEEE'97.Part 3(of 3)


    [27]高阳，周至华．基于Markov对策的多Agent强化学习模型及算法研究．计算机研究与发展．2000(3)：257-263页
    [28]Freund E, Robann J. Systems approach to robotics and automation.IEEE 1994
    [29]Toshio Fukuda. Cell structure robotics system CEBOT: control, planning and communication method. Robotics and Autonomous System. 1991(7):239-248P
    [30]Jin K, Beni G. Stability of synchronized distributed control of discrete swarm structure. 1994: IEEEICRA
    [31]黄闪，谈大龙．基于多自主体的多机器人分布自主协调控制方法的研究．中国智能自动化学术会议论文集
    [32]Jason A. Janet, etc. The Essential Visibility Graph: An Approach to Global Motion Planning for Autonomous Mobile Robots. Proc. of IEEE Int. Conf.on Robotics and Automation, 1995: 1958-1963P
    [33]Lazano-Perez. Spatial Planning: A Configuration Space Approach. IEEE Transaction on Computers, Vol. C-32, No.2, 1983:108-119P
    [34]Brooks, R.A., Lazano-Perez. A Subdivision Algorithm in Configuration Space for Finding Path with Rotation. Proc. of the 8th Int. Conf. on Artificial Intelligence, Karlsruhe, FRG, 1983: 799-806P
    [35]F. Vacherand. Fast Local Path Planning in Certainty Grid. Proc. of IEEE Int. Conf. on Robotics and Automation, 1994:2132-2137P
    [36]Yoshifumi Kitamura, etc. 3-D Path Planning in a Dynamic Environment Using an Octree and an Artificial Potential Field. Proc. of IEEE Int. Conf. on Intelligent Robots and Systems, 1995:474-481P
    [37]C. W. Warren. Multiple Robot Path Coordination using Artificial Potential Fields. Proc. of IEEE Int. Conf. on Robotics and Automation, 1990: 500-505P
    [38]王田苗，张钹等．基于环境势场法的“感知—动作”行为研究．计算机学报．1993(2)：89-96页


    [39]张汝波，熊列彬．基于势场法的路径规划．现代科技译丛．1996(2)：26-29页
    [40]张汝波，张国印，顾国昌．基于势场法的水下机器人全局路径规划的研究．黑龙江自动化技术与应用．1996(4)：30-33页
    [41]董胜龙，陈卫东．多移动机器人的分布式控制系统．机器人．2000(11)：433-438页
    [42]Rodney A Brooks. A robot that walks emergent behavior from a carefully evolved network. Neural Computationl-2
    [43]Rodney A Brooks. A robust layered control system for a mobile robot.Robotics and Autonomous system 1992(4):14-23P
    [44]Arkin R.C Motor schema based mobile robot navigation. Robotics Research 1992(8)
    [45]孙膑．孙子兵法《形》篇，《势》篇．军事出版社．1998：33页，58页
    [46]林来兴．小卫星编队飞行及其轨道构成．中国空间科学技术．2001(2)：23-28页
    [47]陈卫东，董胜龙，席裕庚．基于开放式多智能体结构的分布式自主机器人系统．机器人．2001(1)：45-50页
    [48]Jaydev P. Desai. Control of changes in formation for a team of mobile Proceedings of the 1999 IEEE International Conference on Robotic & Automation 1999 Detroit,Michigan. 1999(5)
    [49]Hiroaki Yamaguchi.A cooperative hunting behavior by mobile-robot troops. Robotic and research. 1999(8): 931-940P
    [50]蔡庆生，张波一种基于Agent团队的强化学习模型与应用研究．计算机研究与发展．2000(9)：1087-1093页
    [51]张汝波，杨广铭 Q-学习及其在智能机器人局部路径规划中的应用研究．计算机研究与发展．1999(12)：1430-1436页
    [52]张汝波顾国昌．强化学习理论、算法及应用．控制理论与应用 2000(10)637-641页
    [53]W.Kang.Formation control of multiple autonomous vehicles. Proceedings of IEEE 1999 International Conference on Control Applications August 1999


    [54]张汝波，孙羽，王醒策．基于自适应量化得智能机器人强化学习方法的研究．Proceedings of the 3rd World Congress on Intelligent Control and Automation.2000(6).Hefei,China.Volume(2):1226-1229P
    [55]王伟．人工神经元网络原理．北京航空航天大学出版社1995．52-76页 157-174页．
    [56]刘增良，刘有才．模糊逻辑与神经网络——理论研究与探索．北京航空航天大学出版社1997(3)：1-98页
    [57]孙增圻．控制理论与技术．清华大学出版社．1992(4)：16-123页
    [58]谭浩强．C程序设计．清华大学出版社．1998
    [59]Chris H.Pappas Willian H.Murray. Visual C++ Handbook-Visual C++手册．科学出版社．1995年3月．
    [60]David J Kruglinski.Visual C++技术内幕．1996年第二版

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700