基于Q学习和神经网络的双足机器人控制

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于Q学习和神经网络的双足机器人控制

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：Learning Biped Locomotion Based on Q-Learning and Neural Networks
作者：彭自强
论文级别：硕士
学科专业名称：控制理论与控制工程
中文关键词：双足步态 ; Q学习 ; 资格迹 ; BP神经网络 ; 仿真平台 ; 欠驱动机器人
英文关键词：Biped locomotion ; Q-Learning ; Eligibility trace ; BP Neural Networks ; Simulation platform ; Quasi-PDW biped robot
学位年度：2012
导师：潘刚 ; 于玲
学科代码：081101
学位授予单位：浙江大学
论文提交日期：2012-03-12
答辩委员会主席：戴连奎

摘要

被动动力学理论认为双足行走是双足机器人的固有特性,可以充分利用机器人自身的动力学特性提高能效。由于机器人结构的多样性,行走时的动力学特性存在差异,很难把人类或者其他机器人的轨迹作为参考步态。而Q学习在不断试错中积累经验,使机器入可以充分利用自身的动力学特性,在和环境的交互中自主学习行走。双足步行是一个连续变化的过程(除了碰撞瞬间),本文针对双足机器人行走控制进行研究,采用基于神经网络的Q学习控制器,实现连续状态的学习,并开发动力学仿真平台和机器人实验平台。本文的主要工作如下：
     1、在双足行走过程中机器人的状态基本上是连续变化的(除了碰撞瞬间)。为了实现对连续状态的控制,本文采用一种基于神经网络的Q学习控制方法。该方法以多输入多输出BP神经网络取代离散的Q值表,计算连续状态对应的Q值。Q学习利用资格迹来解决时间信度分配问题,将资格迹思想融入梯度下降算法中,实现了连续状态的Q学习控制。为降低神经网络的维数,本文提出一种倒立摆位姿-动能模型。采用ε衰减贪婪算法来降低Q学习陷入局部最小的概率。仿真得到了稳定、自然和周期的动态步态,验证了算法的有效性。
     2、为简化操作,提高研究效率,开发适合双足行走研究的仿真平台。采用ADAMS建立参数化模型库,其中包含2连杆、3连杆、4连杆、5连杆和7连杆5种模型。通过自定义菜单和界面可进行模型的加载、初始化、参数修改和结果显示等操作。利用ADAMS和MATLAB的接口模块ADAMS/Controls,实现基于ADAMS和MATLAB的双足步行联合仿真。仿真实验表明该仿真平台避开了复杂的建模过程,简化了繁琐的操作,明显地提高了仿真效率。
     3、基于被动动力学控制理论,试制8自由度欠驱动2D双足行走机构。膝关节是被动关节,具有锁紧机构。髋关节和踝关节是主动关节,采用一个直流伺服电机和一个虚拟的柔性执行机构驱动。实验平台采用典型的集中式控制系统,使用CAN总线实现快速通信。实时控制软件具有初始化、周期控制、数据采集、通讯、数据保存、故障处理、结束处理等功能。本文设计的双足行走机器人实验平台具有简单、易用、高精度的特点。
Biped locomotion has been thought as a nature character for biped robot by Passive Dynamic Walking (PDW) theory, and the energy efficiency could be improved by using the nature dynamics of biped robot. Because of different mechanical structure of robots, their dynamics are much different. So it is not advisable to track the other biped robots' or people's gait. The optimal policy is found by a series of trial and error in Q-Learning theory, and biped locomotion could be learnt by interaction between robot and floor. Then the nature dynamics of biped robot will be devoted to improve the energy efficiency of biped gait. For more deep research on biped gait control method, a dynamic simulation platform and a real biped robot are designed.
     1、Robot postures are transformed continuously until an impact occurs. In order to deal with the continuous state's learning problem, a Q-Learning controller based on BP Neural Networks is designed. Instead of Q table, a Multi-input and Multi-output BP Neural Network is employed to compute Q value for continuous state. In order to manage time reliability problem in Q-Learning and we integrate the eligibility trace algorithm with the gradient descent method for continuous state. To avoid dimension explosion, an inverted pendulum pose-energy model is built to reduce the dimension of the input state space. For the sake of balance between "explore" and "exploit" of Q-Learning, we use a newε-greedy method with a variable stochastic probability, which decreases with the increasing of the step number. Simulation results indicate that the proposed method is effective.
     2、To simplify operation and improve the efficiency of simulation, a biped robot simulation platform is developed. ADAMS is applied to build a parametric model library, including two links model, three links model, four links model, five links model and seven links model. Then customized menus and graphic user interfaces (GUI) are developed for loading models, initiation, modifying parameters and showing simulation result. By the interface module ADAMS/Controls, it is easy to co-simulate with ADAMS and MATLAB. With the co-simulate platform, the heavy works of manual modeling is avoided and simulation efficiency is improved.
     3、Based on PDW theory, we design a 2D quasi-PDW biped robot, which has 8 degrees of freedom (DOF). There is a latch mechanism on knee, and the support leg could be upstanding. The virtual flexible actuator and DC servo motor are used for ankle and hip. The control system of biped robot is a classic hierarchy control system, and CAN bus is used for quickly communication. A GUI is designed for initiating, real-time control, data acquisition, saving data, recovery processing and so on. The biped robot will be a simple, easy to use and high precision platform.

引文

[1]梶田秀司,管贻生.仿人机器人[M].清华大学出版社,2007.
    [2]徐心和.从计算机博弈到机器人足球——人工智能长期而持续的挑战[J].机器人技术与应用,2010(1)：10-13.
    [3]毛勇.半被动双足机器人的设计与再励学习控制.[博士学位论文][D].清华大学,2007.
    [4]Hurmuzlu Y, Genot F, Brogliato B. Modeling, Stability and Control of Biped Robots-A General Framework[J]. Automatica,2004,40(10):1647-1664.
    [5]刘成军.双足机器人欠驱动动态步行仿人运动控制研究.[博士学位论文][D].重庆大学,2011.
    [6]杨智勇,张静,归丽华,等.外骨骼机器人控制方法综述[J].海军航空工程学院学报,2009(5)：520-526.
    [7]Vukobratovic M, Borovac, Branislav. Zero Moment Point Thirty Five Years of its Life[J]. Humanoid Robotics,2004,1(1):157-173.
    [8]Qiang H, Kaneko K, Yokoi K, et al. Balance Control of A Biped Robot Combining Off-line Pattern with Real-time Modification[C]:ICRA'00. IEEE International Conference on Robotics and Automation,3346-3352.
    [9]Goswami A. Postural Stability of Biped Robots and the Foot-Rotation Indicator (FRI) point[J]. International Journal of Robotics Research,1999,18(6):523-533.
    [10]Qiang H, Yokoi K, Kajita S, et al. Planning Walking Patterns for A Biped Robot[J]. IEEE Transactions on Robotics and Automation,2001,17(3):280-289.
    [11]Garcia E, Estremera J, De Santos P G. A Classification of Stability Margins for Walking Robots[C]:2002 Climbing and Walking Robots,799-808.
    [12]Ijspeert A J, Nakanishi J, Schaal S. Movement Imitation with Nonlinear Dynamical Systems in Humanoid Robots[C]:ICRA'02. IEEE International Conference on Robotics and Automation,1398-1403.
    [13]Nakamura Y, Mori T, Sato M, et al. Reinforcement Learning for A Biped Robot Based on A CPG-Actor-Aritic Method[J]. Neural Networks,2007,20(6):723-735.
    [14]Kyung-Kon N, Jin-Geol K, Uk-Youl H. Stability Experiment of A Biped Walking Robot with Inverted Pendulum[C]:IECON'04.30th Annual Conference of IEEE on Industrial Electronics Society,2475-2479.
    [15]Kajita S, Tani K. Study of Dynamic Biped Locomotion on Rugged Terrain-Theory and Basic Experiment[C]:ICAR'91, Fifth International Conference on Advanced Robotics,741-746.
    [16]HONDA. Humanoid Robot Site[EB/OL]. [2011]. http://www.honda.co.jp/robot/.
    [17]Sakagami Y, Watanabe R, Aoyama C, et al. The Intelligent ASIMO:System Overview and Integration[C]:2002 IEEE/RSJ International Conference on Intelligent Robots and Systems,2478-2483.
    [18]Kaneko K, Kanehiro F, Morisawa M, et al. Humanoid Robot HRP-4-Humanoid Robotics Platform with Light Weight and Slim Body[C]:2011 IEEE/RSJ International Conference on Intelligent Robots and Systems,4400-4407.
    [19]Wang G, Huang Q, Geng J H, et al. Cooperation of Dynamic Patterns and Sensory Reflex for Humanoid Walking[C]:2003 IEEE International Conference on Robotics and Automation,2472-2477.
    [20]Yang J, Huang Q, Li J X, et al. Walking Pattern Generation for Humanoid Robot Considering Upper Body Motion[C]:2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vols 1-12,4441-4446.
    [21]刘莉,汪劲松,陈恳,等.THBIP-Ⅰ拟人机器人研究进展[J].机器人,2002(3)：262-267.
    [22]伊强,陈恳,刘莉,等.小型仿人机器人THBIP-Ⅱ的研制与开发[J].机器人,2009(6)：586-593.
    [23]窦瑞军,马培荪,谢玲.两足机器人步态的参数化设计及优化[J].机械工程学报,2002(4)：36-39.
    [24]汤卿.仿人机器人设计及步行控制方法.[博士学位论文][D].浙江大学,2009.
    [25]相远飞.双足仿人机器人步态规划与平衡控制.[硕士学位论文][D].浙江大学,2011.
    [26]Steve Collins A R R T. Efficient Bipedal Robots Based on Passive-Dynamic Walkers[J]. Science,2005,307:1082-1085.
    [27]MCMAHON T A. Mechanics of Locomotion [J]. International Journal of Robotics Research,1984,3(2):4-28.
    [28]McGeer T. Passive Dynamic Walking[J]. Robotics Research,1990,9(2):62-82.
    [29]McGeer T. Passive Bipolar Running [J]. Proceedings of the Royal Society of London Series B-Biological Science,1990,240(1297):107-134.
    [30]McGeer T. Passive Walking with Knees[C]:1990 IEEE International Conference on Robotics and Automation,1640-1645.
    [31]Collins S H, Wisse M, Ruina A. A Three-Dimensional Passive-Dynamic Walking Robot with Two Legs and Knees[J]. International Journal of Roboticcs Research, 2001,20(7):607-615.
    [32]Collins S H, Ruina A. A Bipedal Walking Robot with Efficient and Human-Like Gait[C]:2005 IEEE International Conference on Robotics and Automation,1983-1988.
    [33]Hobbelen D G E, Wisse M. Limit Cycle Walking[M]. Humanoid Robots:Human-like Machines,2007.
    [34]Tedrake R, Zhang T W, Seung H S. Stochastic Policy Gradient Reinforcement Learning on A Simple 3D Biped Robot [C]:2004 IEEE/RSJ International Conference on Intelligent Robots and Systems,2849-2854.
    [35]Morimoto J, Atkeson C G. Learning Biped Locomotion[J]. Robotics Automation Magazine, IEEE,2007,14(3):41-51.
    [36]Wisse M, Hobbelen D G E, Rotteveel R J J, et al. Ankle Springs Instead of Arc-Shaped Feet for Passive Dynamic Walkers[C]:2006 6th IEEE-RAS International Conference on Humanoid Robots,110-116.
    [37]Schuitema E, Hobbelen D G E, Jonker P P, et al. Using A Controller Based on Reinforcement Learning for A Passive Dynamic Walking Robot[C]:2005 5th IEEE/RAS International Conference on Humanoid Robots,232-237.
    [38]Chevallereau C, Abba G, Aoustin Y, et al. RABBIT:A Testbed for Advanced Control Theory[J]. Control Systems, IEEE,2003,23(5):57-79.
    [39]Westervelt E R, Grizzle J W, Koditschek D E. Hybrid Zero Dynamics of Planar Biped Walkers[J]. IEEE Transactions on Automatic Control,2003,48(1):42-56.
    [40]毛勇李实王家贾培发.基于再励学习的被动动态步行机器人[J].清华大学学报(自然科学版),2008,48(3)：92-96.
    [41]Van Ham R, Vanderborght B, Van Damme M, et al. MACCEPA:the Mechanically Adjustable Compliance and Controllable Equilibrium Position Actuator for Controlled Passive Walking'[C]:2006 IEEE International Conference on Robotics and Automation,2195-2200.
    [42]毛勇.半被动双足机器人的设计与再励学习控制.[博士学位论文][D].清华大学,2007.
    [43]张汝波.强化学习理论及应用[M].哈尔滨：哈尔滨工程大学出版社,2001.
    [44]Rezzoug N, Gorce P. A Reinforcement Learning Based Neural Network Architecture for Obstacle Avoidance in Multi-Fingered Grasp Synthesis[J]. Neuro Computing Brain Inspired Cognitive Systems (BICS 2006)/Interplay Between Natural and Artificial Computation (IWINAC 2007),2009,72(4-6):1229-1241.
    [45]任红格,阮晓钢.Skinner操作条件反射的一种仿生学习算法与机器人控制[J].机器人,2010(1)：132-137.
    [46]Sutton R S, Barto A G. Reinforcement Learning:An Introduction[M]. The MIT Press Cambridge, Massachusetts London, England,1998.
    [47]刘振泽.欠驱动步行机器人运动学机理与控制策略研究.[博士学位论文][D].吉林大学,2007.
    [48]Schwab A L, Wisse M. Basin of Attraction of the Simplest Walking Model[C]: ASME Design Engineering Technical Conferences.
    [49]Garcia M, Chatterjee A, Ruina A, et al. The Simplest Walking Model:Stability, Complexity, and Scaling[J]. Journal of Biomechanics Engineering -Transactionals of the ASME,1998,120(2):281-288.
    [50]Grizzle J W, Abba G, Plestan F. Asymptotically Stable Walking for Biped Robots: Analysis Via Systems with Impulse Effects[J]. IEEE Transactions on Automatic Control,2001,46(1):51-64.
    [51]Pratt J, Dilworth P, Pratt G. Virtual Model Control of A Bipedal Walking Robot[C]: 1997 IEEE International Conference on Robotics and Automation,193-198.
    [52]Pratt J E. Exploiting Inherent Robustness and Natural Dynamics in the Control of Bipedal Walking Robots.[MA Dissertation][D]. Cambridge:Massachusetts Inst. Technol.,2000.
    [53]Wang S, Braaksma J, Babuska R, et al. Reinforcement Learning Control for Biped Robot Walking on Uneven Surfaces[C]:2006 International Joint Conference on Neural Networks,4173-4178.
    [54]胡凌云,孙增圻.基于T-S模糊再励学习的稳定双足步态生成算法[J].机器人,2004(5)：461-466.
    [55]Yan X W, Deng Z D, Sun Z Q. Fuzzy Advantage Learning[C]:2000 9th IEEE International Conference on Fuzzy Systems,865-870.
    [56]蒋国飞,吴沧浦.基于Q学习算法和BP神经网络的倒立摆控制[J].自动化学报,1998,vol.24,No.5(1):662-666.
    [57]任红格,阮晓钢.Skinner操作条件反射的一种仿生学习算法与机器人控制[J].机器人,2010(1)：132-137.
    [58]Anderson C W. Learning to Control An Inverted Pendulum Using Neural Networks[J]. IEEE Control Systems Magazine,1989;9(3):31-37.
    [59]Rumelhart D E, Hinton G E, Williams R J. Learning Representations by Back-Propagating Errors[J]. Nature,1986,323(6088):533-536.
    [60]Craig J J C C. Introduction to Robotics:Mechanics and Control (3rd Edition)[M]. Prentice Hall,2004.
    [61]Fernandez-Redondo M, Hernandez-Espinosa C. Weight Initialization Methods for Multilayer Feedforward[C]:2001 European Symposium on Artificial Neural Networks,25-27.
    [62]Hale J G, Hohl B, Hyon S H, et al. Highly Precise Dynamic Simulation Environment for Humanoid Robots[J]. Advanced Robotics,2008,22(10):1075-1105.
    [63]Yuepin L, Qiang H, Min L, et al.3D-Simulation for the Teleoperation of the Humanoid Robot BHR-02[C]:ICAL'08. IEEE International Conference on Automation and Logistics,816-821.
    [64]Chardonnet J R, Miossec S, Kheddar A, et al. Dynamic Simulator for Humanoids Using Constraint-based Method with Static Friction[C]:ROBIO'06. IEEE International Conference on Robotics and Biomimetics,1366-1371.
    [65]Smith R. Open Dynamics Engine v0.5 User Guide[EB/OL]. [2011]. http://www.ode.org/ode-latest-userguide.html.
    [66]Metta G, Natale L, Nori F, et al. The iCub Humanoid Tobot:An Open-Systems Platform for Research in Cognitive Development[J]. Neutral Networks, 2010,23(8-9SI):1125-1134.
    [67]薛方正,刘成军,李楠,等.基于ODE引擎的开放式仿人机器人仿真[J].机器人,2011(01).
    [68]瞿叶高,程志强,卜长根,等.ADAMS二次开发技术在气动潜孔锤虚拟样机建模中的应用[J].系统仿真学报,2009(10)：2951-2955.
    [69]陈立平.机械系统动力学分析及ADAMS应用教程[M].北京：清华大学出版社,2005.
    [70]郭磊余朝举魏世民.基于虚拟样机技术的双足步行机器人联合动力学仿真[J].机械与电子,2008(12)：46-48.
    [71]黎海青,郭百巍,徐红.基于ADAMS与SIMULINK的舵机虚拟样机建模和仿真[J].系统仿真学报,2009(21)：6886-6888.
    [72]赵波,刘杰,戴丽,等.桥梁检测综合作业车运动控制与仿真[J].机械设计,2009(03)：13-16.
    [73]韩朝晖.基于ADAMS和MATLAB的汽车悬架系统仿真分析[J].机械设计,2008(07)：16-19.
    [74]罗建国,陆震.冗余驱动串并联机器人运动学联合仿真[J].机械设计,2007(03)：4-6.
    [75]李增刚.Adams入门详解与实例[M].北京：国防工业出版社,2006.
    [76]谢最伟,吴新跃.基于ADAMS的碰撞仿真分析：第三届中国CAE工程分析技术年会暨2007全国计算机辅助工程(CAE)技术与应用高级研讨会,2007[C].
    [77]范辉.RS485总线与CAN总线应用比较[J].上海电机学院学报,2005(5)：54-56.
    [78]MATLAB资源网.Matlab与C#交互的三种方式[EB/OL]. [2011]. http://www.ymlib.net/article/sort010/info-1634.html.
    [79]孔艳.船舶动力电力系统仿真及监控界面设计与实现.[硕士学位论文][D].大连海事大学,2010.
    [80]庄红林,王文斌,范蔷.MATLAB与.NET平台接口技术的研究[J].云南民族大学学报(自然科学版),2007(3)：263-266.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700