用户名: 密码: 验证码:
仿生水下机器人的增强学习控制方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
仿生水下机器人是近年来水下机器人领域的研究热点之一。仿生水下机器人复杂的动力学特性和不确定的工作环境使得其运动控制问题非常具有挑战性,直接影响着整体性能的提升。本文针对一类双波动鳍配置的仿生水下机器人,基于在增强学习框架下解决其运动控制问题的研究思路,围绕运动控制问题分析、增强学习算法构建、增强学习姿态镇定、增强学习轨迹跟踪以及试验验证等几方面内容展开研究,主要工作和研究成果包括:
     (1)从仿生学启示、仿生波动鳍和仿生水下机器人的动力学特性等角度对一类双波动鳍配置仿生水下机器人的运动控制问题进行了系统分析。研究了仿生对象的外部形态和游动特性,基于仿生学启示设计了仿生波动鳍推进器和仿生水下机器人“双仿生波动鳍+双摆动鳍+双自由度仿生鳔”组合推进控制方案,针对实际物理装置开展了仿生波动鳍和仿生水下机器人的推力试验和运动试验,获取了相关的动力学特性,为仿生水下机器人运动控制方法的设计提供了指导。
     (2)针对机器人控制的实际需求和基本Q学习算法的局限性,提出了一种面向实际机器人控制应用的连续状态-动作空间神经Q学习算法(CSANQL算法),综合利用前馈神经网络、学习样本数据库、Q值估计拟合函数、以及基本Q学习算法,实现了在连续状态和连续动作之间的快速有效映射。研究了神经Q学习算法的两种实现结构,揭示了基于Q值估计拟合函数实现连续动作的机理,分析了学习样本数据库在提高算法学习效率方面的作用,阐明了增强学习算法与仿生水下机器人运动控制的结合途径,为仿生水下机器人增强学习控制方法的研究奠定了基础。
     (3)针对仿生水下机器人的姿态镇定问题,从学习优化和学习控制两个层次提出并设计实现了增强学习自适应PID控制、增强学习控制和监督增强学习控制等三种增强学习姿态镇定方法。研究了基于增强学习的参数自适应机制,分析了学习样本数据库和监督控制在增强学习控制方法中的重要作用,并通过仿真对增强学习控制方法在姿态镇定问题中的有效性进行了初步验证。结果表明,增强学习自适应PID控制器能够主动学习最优的PID控制器参数,具有较好的姿态镇定性能;以CSANQL算法为基础的增强学习控制器的性能受学习样本数据库的影响,当学习样本数据库容量适当时能够有效实现姿态镇定目标;监督控制的引入,加快了学习的收敛速度,确保了学习过程尤其是学习初期输出动作的稳定性,使得监督增强学习控制器具有比增强学习自适应PID控制器和增强学习控制器更好的姿态镇定性能。
     (4)针对仿生水下机器人的轨迹跟踪问题,提出并设计实现了一种基于增强学习行为的行为控制结构。从复杂的轨迹跟踪任务中提取推进、偏航和定深等三个基本控制行为作为实现各种轨迹跟踪任务的基础,设计了基于增强学习控制方法的基本控制行为,提出了基于增强学习的行为组合优化方法,并围绕三维空间中的直线轨迹跟踪和曲线轨迹跟踪任务开展了仿真研究。结果表明,增强学习行为控制结构能够快速响应目标运动轨迹,在复杂的多通道轨迹跟踪任务中也具有较好的跟踪控制性能。
     (5)基于研究组自行研制的仿生水下机器人试验系统,开展了仿生水下机器人增强学习控制方法的试验研究,从姿态镇定和轨迹跟踪两方面进一步验证了论文提出的增强学习控制方法的有效性。研究表明,基于CSANQL算法的监督增强学习控制器具有比单纯增强学习控制器或传统PID控制器更好的姿态镇定性能;在基于增强学习行为的行为控制结构作用下,仿生水下机器人能够较好地跟踪设定的轨迹跟踪任务。
     上述研究工作和成果在仿生水下机器人的运动控制问题和增强学习控制方法的实际应用方面进行了有益探索,为在增强学习框架下最终实现仿生水下机器人的高效自主运动控制奠定了基础。
The bionic underwater robot is one of the hotspots in the underwater robotics research field in recent years. It has complicated dynamic characteristics and uncertain working environments which make the motion control of bionic underwater robots a challenging problem. This thesis takes the bionic underwater robot with two undulating fins as research object, and aims to figure out the motion control problem in the framework of reinforcement learning. The studies in this thesis concentrate on the motion control problems analysis, the reinforcement learning algorithm design, reinforcement learning based attitude stabilization, reinforcement learning based trajectory tracking, and corresponding experimental verifications. The main achievements and progress are as follows:
     (1) The motion control problems of the bionic underwater robot with two undulating fins are analyzed from the bionic inspirations, the dynamic characteristics of the bionic undulating fin and the bionic underwater robot. The morphology and swimming characteristics of Gymnarchus Niloticus and Bluespotted Ray are investigated to provide inspirations to the design of bionic underwater robots; and then, the bionic undulating fin thruster and the bionic underwater robot with two bionic undulating fins, two swing fins and a 2-DOF bionic bladder are designed. After that, the thrust testing and motion testing are carried out and the corresponding dynamic characteristics are analyzed, which provide directions for the motion controller of bionic underwater robots.
     (2) According to the requirements of robot control and the restrictions of the original Q-learning algorithm, a continuous state and action space neural Q-learning algorithm (CSANQL) is presented, which lays a foundation for the reinforcement learning control of bionic underwater robots. By utilizing the neural network, the database of learning samples, the fitting function of estimated Q values and the original Q-learning algorithm synthetically, the CSANQL algorithm realizes fast mapping between continuous states and continuous actions. The two structure of neural Q-learning, the mechanism of generating continuous actions by the fitting function of estimated Q values, and the effects of the database of learning samples on improving the learning efficiency are detailed. The approaches of incorporating the reinforcement learning algorithms into the motion control of bionic underwater robots are also discussed.
     (3) For the attitude stabilization of the bionic underwater robot, three reinforcement learning based attitude stabilization methods including reinforcement learning based adaptive PID controller, reinforcement learning controller and supervised reinforcement learning controller are put forward and implemented. The adaptive mechanism of parameters based on reinforcement learning is discussed, and the functions of the databae of learning samples and the supervisory controller are elaborated. Simulations are carried out to test the validity of the reinforcement learning control in attitude stabilization. Results indicate that reinforcement learning based adaptive PID controller can learn the optimal PID parameters actively, and has good attitude stabilization performance; the database of learning samples has great influence on the performance of reinforcement learning controller, and preferable performance can be achieved if proper capability of the database is given; the supervised reinforcement leanring controller have better performance than reinforcement learning based adaptive PID controller and reinforcement learning controller in learning efficiency and dynamic process.
     (4) For the trajectory tracking of the bionic underwater robot, a behavior control structure which based on reinforcement learning behaviors is devised and implemented. Thrusting behavior, yawing behavior and depth-keeping behavior are extracted from complex trajectory tracking tasks, and treated as three basic control behaviors which would realize almost all trajectories in 3D world. The implementation and performance of each basic control behavior are discussed; and a reinforcement learning based optimization method for behaviors combination is expounded. Simulations with straight line trajectory and curve trajectory are performed to validate the validity of reinforcement learning control for trajectory tracking. Results indicate that the reinforcement learning behavior control structure can respond to the target trajectory quickly and has favorable tracking performance.
     (5) Experiments are carried out based on the bionic underwater robot with two undulating fins to test the reinforcement learning control methods both in attitude stabilization and trajectory tracking. According to the experimental data, the CSANQL based supervised reinforcement learning controller behaves better than pure reinforcement learning controller and convertional PID controller in attitude stabilization; and the bionic underwater robot can perform some trajectory tracking tasks with good performance in the control of reinforcement learning behavior control structure.
     All the above achievements effectively facilitate the breakthrough at the motion control of the bionic underwater robots and the practical application of reinforcement learning control methods, and consequently, lay the foundation for realizing efficient autonomous motion control of bionic underwater robots within the reinforcement learnig framework.
引文
[1]蒋新松,封锡盛,王棣棠.水下机器人[M].沈阳:辽宁科学技术出版社, 2000.
    [2]李天森.鱼雷操纵性(第2版)[M].北京:国防工业出版社, 2007.
    [3]桑恩方,庞永杰,卞红雨.水下机器人技术[J].机器人技术与应用, 2003, 2003(3):8-13.
    [4]彭学伦.水下机器人的研究现状与发展趋势[J].机器人技术与应用, 2004, 2004(4):43-47.
    [5]马光.仿生机器人的研究进展[J].机器人, 2001, 23(5):463-466.
    [6]张秀丽,郑浩峻,陈恳等.机器人仿生学研究综述[J].机器人, 2002, 24(2):188-192.
    [7]喻俊志,陈尔奎,王硕等.仿生机器鱼研究的进展与分析[J].控制理论与应用, 2003, 20(4):485-491.
    [8]吉爱红,戴振东,周来水.仿生机器人的研究进展[J].机器人, 2005, 27(3):284-288.
    [9]中国科学院生物物理研究所.生物的启示:仿生学四十年研究纪实[M].北京:科学出版社, 2008.
    [10] Blake R W. Swimming in the electric ells and knifefishes[J]. Canadian Journal of Zoology, 1983, 61:1432-1441.
    [11] Webb P W. Form and function in fish swimming[J]. Science American, 1984, 251:58-68.
    [12] Altringham J D, Ellerby D J. Fish swimming: Patterns in muscle function[J]. Experimental Biology, 1999, 202:3397-3403.
    [13] MacIver M A, Fontaine E, Burdick J W. Designing future underwater vehicles: Principles and mechanisms of the weakly electric fish[J]. Journal of Oceanic Engineering, 2004, 39(3):651-659.
    [14] Lauder G V, Drucker E G. Forces, fishes, and fluids: Hydrodynamic mechanisms of aquatic locomotion[J]. News Physiol Sci 17, 2002, 235-240.
    [15] Blake R W. Fish functional design and swimming performance [J]. Journal of Fish Biology, 2004, 65(5):1193-1222.
    [16] Triantafyllou M S, Techet A H, Hover F S. Review of experimental work in biomimetic foils[J]. IEEE Journal of Oceanic Engineering, 2004, 29(3):585-594.
    [17]程健宇,庄礼贤,童秉纲.鱼类鳗鲡模式推进的游动性能分析[J].水动力学研究与进展, 1988, 3(3):87-97.
    [18]童秉纲.鱼类波状游动的推进机制[J].力学与实践, 2000, 22(3):69-74.
    [19]尹协振.鱼类机动运动的特征和机理研究[R].国家自然科学基金, 2004.
    [20] Triantafyllou M S, Triantafyllou G S. An efficient swimming machine[J]. Scientific American, 1995, 64-70.
    [21] Anderson J M, Kerrebrock P A. The Vorticity Control Unmanned Undersea Vehicle (VCUUV): An autonomous robot Tuna[C]. 10th International Symposium on Unmanned Untethered Submersible Technology, Durham, NH, 1997, 63-70.
    [22] Kato N. Control performance of fish robot with mechanical pectoral fins in horizontal plane[J]. IEEE J Oceanic Eng, 2000, 25(1):121-129.
    [23] Mason R J, Burdick J W. Experiments in carangiform robotic fish locomotion[C]. Proceedings of the 2000 IEEE International Conference on Robotics & Automation, 2000, 428-435.
    [24] Fish F E, Lauder G V, Mittal R, et al. Conceptual design for the construction of a biorobotic auv based on biological hydrodynamics[C]. 13th International Symposium on Unmanned Submersible Technology, Durham, New Hampshire, 2003.
    [25] Kim E J, Youm Y I. Design and dynamic analysis of fish-like robot: Potuna[C]. 2004 IEEE International Conference on Robotics and Automation, New Orleans, LA, 2003, 4887-4892.
    [26] Kato N. Median and paired fin controllers for biomimetic marine vehicles[J]. Transactions of the ASME, 2005, 58:238-254.
    [27] Low K H, Willy A. Development and initial investigation of ntu robotic fish with modular flexible fins[C]. 2005 IEEE International Conference on Mechatronics & Automation, Niagara Falls, Canada, 2005, 958-963.
    [28] Tangorra J L, Davidson S N, Hunter I W, et al. The development of a biologically inspired propulsor for unmanned underwater vehicles[J]. IEEE Journal of Oceanic Engineering, 2007, 32(3):533 - 550.
    [29] Mohammadshahi D, Yousefi-koma A, Bahmanyar S, et al. Design, fabrication and hydrodynamic analysis of a biomimetic robot fish[C]. 10th WSEAS International Conference on Automatic Control, Modeling & Simulation, Istanbul, Tureky, 2008, 249-254.
    [30]梁建宏,王田苗,魏洪兴.水下仿生机器鱼研究进展ii—小型实验机器鱼研制[J].机器人, 2002, 24(3):234-238.
    [31]苏玉民,黄胜,庞永杰等.仿鱼尾潜器推进系统的水动力分析[J].海洋工程, 2002, 22(2):54-59.
    [32]成巍,苏玉民,秦再白等.一种仿生水下机器人的研究进展[J].船舶工程, 2004, 26(1):5-8.
    [33]梁建宏,邹丹,王松等. SPC-II机器鱼平台及其自主航行实验[J].北京航空航天大学学报, 2005, 31(7):709-713.
    [34]张志刚,喻俊志,王硕等.多关节仿鱼运动推进机构的设计与实现[J].中国造船, 2005, 46(1):22-28.
    [35]王龙,喻俊志,胡永辉.机器海豚的机构设计与运动控制[J].北京大学学报(自然科学版), 2006, 42(3):294-301.
    [36]文力,梁建宏,谢成荫等. SPC-3 UUV仿生机器鱼及其长航时实验[J].机器人技术与应用, 2007, 33-36.
    [37]李明,史金飞,宋春峰等.一种摆动式柔性尾部的仿生机器鱼[J].东南大学学报(自然科学版), 2008, 38(1):32-36.
    [38]张代兵.波动鳍仿生水下推进器及其控制方法研究[D].长沙:国防科技大学, 2007.
    [39] Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey[J]. Journal of Artificial Intelligence Research, 1996, (4):237-285.
    [40] Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. Cambridge, MA: MIT Press, 1998.
    [41] Smart W D. Making reinforcement learning work on real robots[D]. Providence, Rhode Island: Brown University, 2002.
    [42]高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报, 2004, 30(1):86-100.
    [43] Gaskett C, Wettergreen D, Zelinsky A. Reinforcement learning applied to the control of an autonomous underwater vehicle[C]. Australian Conference on Robotics and Automation, 1999.
    [44] Oh S-Y, Lee J-H, Choi D-H. A new reinforcement learning vehicle control architecture for vision-based road following[J]. IEEE Transactions on Vehicular Technology, 2000, 49(3):997-1005.
    [45] Carreras M. Behavior-based control architecture with reinforcement learning for an autonomous underwater robot[D]. Girona: University of Girona, 2003.
    [46] El-Fakdi A, Carreras M, Palomeras N, et al. Autonomous underwater vehicle control using reinforcement learning policy search methods[J]. Oceans 2005- Europe, 2005, 2:793-798.
    [47] Liu J, Hu H, Gu D. Rl-based optimisation of robotic fish behaviours[C]. 6th World Congress on Control and Automation, Dalian, China, 2006, 3992-3996.
    [48]王光明.仿鱼柔性长鳍波动推进理论与实验研究[D].长沙:国防科技大学, 2007.
    [49]胡天江.仿生长鳍波动适应性理论与控制方法研究[D].长沙:国防科技大学, 2008.
    [50]徐海军.液压驱动波动鳍仿生推进器关键技术研究[D].长沙:国防科技大学, 2010.
    [51]谢海斌.基于多波动鳍推进的仿生水下机器人设计、建模与控制[D].长沙:国防科技大学, 2006.
    [52]周晗.仿生波动推进水下机器人水动力计算与试验研究[D].长沙:国防科技大学, 2009.
    [53] Barrett D, Grosenbaugh M, Triantafyllou M S. The optimal control of a flexible hull robotic undersea vehicle propelled by an oscillating foil[C]. IEEE AUV Symposium, 1996, 1-9.
    [54] Nakshima M. Experimental study of a self-propellered two-joint dolphin robot[C]. 9th International Offshore and Polar Engineering Conference, 1994, 419-424.
    [55] Liu J D. A 3D simulator for autonomous robotic fish[J]. International Journal of Automation and Computing, 2004:42-50.
    [56]文力,梁建宏,王田苗等.基于续航能力的仿生水下航行器设计与实验[J].北京航空航天大学学报, 2008, 34(3):340-443.
    [57] Colgate J E, Lynch K M. Mechanics and control of swimming: A review[J]. IEEE Journal of Oceanic Engineering, 2004, 29(3):660-673.
    [58] Sfakiotakis M, Lane D M, Davies B C. An experimental undulating-fin device using the parallel bellows actuator[C]. IEEE International Conference on Robotics & Automation, Seoul,Korea, 2001, 2356-2362.
    [59] MacIver M A, Fontaine E, Burdick J W. Designing future underwater vehicles: Principles and mechanisms of the weakly electric fish[J]. IEEE Journal of Oceanic Engineering, 2004, 29(3):651-659.
    [60] Toda Y, Fukui K, Sugiguchi T. Fundamental study on propulsion of a fish-like body with two undulating side[J]. Asia Pacific Workshop on Marine Hydronamics, Kobe, Japan, 2002:227-232.
    [61] Toda Y, Suzuki T, Uto S, et al. Fundamental study on a fish-like body with two undulating side fins[C]. Second International Symposium on Aqua Bio-Mechanism, Japan, 2003.
    [62] Toda Y, Ikeda H, Sogihara N. The motion of a fish-like under-water vehicle with two undulating side fins[C]. The Third International Symposium on Aero Aqua Bio-mechanisms, Ginowan, Okinawa, Japan 2006.
    [63] Willy A, Low K H. Initial experimental investigation of undulating fin[C]. IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, Canada, 2005, 2059-2064.
    [64] Zhang Y, Song Y, Yang J, et al. Numerical and experimental research on modular oscillating fin[J]. Journal of Bionic Engineering, 2008, 5:13-23.
    [65]章永华.基于功能材料的柔性多关节水下仿鱼形推进器设计及分析[J].机器人, 2006, 28(4):67-73.
    [66]王振龙,杭观荣,李健等.面向水下无声推进的形状记忆合金丝驱动柔性鳍单元[J].机械工程学报, 2009, 45(2):126-131.
    [67]杭观荣王,王扬威,李健.肌肉性静水骨骼原理的仿乌贼鳍推进器[J].哈尔滨工业大学学报, 2009, 41(11):59-64.
    [68] Zhang Z G, Wang S, Tan M. 3-D locomotion control for a biomimetic robot fish[J]. Journal of Control Theory and Applications, 2004, (2):169-174.
    [69]刘军考,陈维山,陈在礼.仿生机器鱼的运动学参数及试验研究[J].中国机械工程, 2002, 13(16):1354-1357.
    [70] Liu J, Dukes I, Hu H. Novel mechatronics design for a robotic fish[J]. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005:2077-2082.
    [71] Shang L, Wang S, Tan M, et al. Motion control for an underwater robotic fish with two undulating long-fins[C]. Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, Shanghai, China, 2009, 6478-6483.
    [72] Menozzi A, Leinhos H A, Beal D N, et al. Open-loop control of a multifin biorobotic rigid underwater vehicle[J]. IEEE Journal of Oceanic Engineering, 2008, 33(2):59-68.
    [73]周超,曹志强,王硕等.仿生机器鱼俯仰与深度控制方法[J].自动化学报, 2008, 34(9):1215-1218.
    [74] Yoerger D R, Slotine J-J E. Robust trajectory control of underwater vehicles[J]. IEEE Journal of Ocean Engineering, 1985, 10(4):462-470.
    [75] Campa G, Innocenti M, Nasuti F. Robust control of underwater vehicles: Sliding mode control vs. Mu synthesis[C]. OCEANS'98, 1998, 1640-1644.
    [76] Innocenti M, Campa G. Robust control of underwater vehicles: Sliding mode vs. Lmi synthesis[C]. American Controls Conference, San Diego, CA, 1999.
    [77] Bessa W M, Dutra M S, Kreuzer E. Thruster dynamics compensation for the positioning of underwater robotic vehicles through a fuzzy sliding mode based approach[C]. 18th International Congress of Mechanical Engineering, Ouro Preto, MG, 2005, 605-612.
    [78] Sebastián E, Sotelo M A. Adaptive fuzzy sliding mode controller for the kinematic variables of an underwater vehicle[J]. J Intell Robot Syst, 2007, 49:189-215.
    [79] Chatchanayuenyong T, Parnichkun M. Time optimal hybrid sliding mode-PI control for an autonomous underwater robot[J]. International Journal of Advanced Robotic Systems, 2008, 5(1):91-98.
    [80] Kim T W, Yuh J. Application of on-line neuro-fuzzy controller to AUVs[J]. Information Sciences, 2002, 145:169-182.
    [81] Wang J-S, Lee C S G. Self-adaptive recurrent neuro-fuzzy control of an autonomous underwater vehicle[J]. IEEE Transactions on Robotics and Automation, 2003, 19(2):283-295.
    [82] Jun-Zhi Y, Er-Kui C, Shuo W, et al. Motion control algorithms for a free-swimming biomimetic robot fish[J]. ACTA AUTOMATICA SINICA, 2005, 31(4):537-542.
    [83]刘永红.神经网络理论的发展与前沿问题[J].信息与控制, 1999, 28(1):31-46.
    [84] Kodogiannis V S, Lisboa P J G, Lucas J. Neural network modelling and control for underwater vehicles[J]. Artificial Intelligence in Engineering, 1996, 1:203-212.
    [85] Yuan X, YaonanWang. Neural networks based self-learning PID control of electronic throttle[J]. Nonlinear Dyn, 2009, 55:385-393.
    [86] Yuh J. A neural net controller for underwater robotic vehicles[J]. IEEE Journal of Oceanic Engineering, 1990, 15(3):161-166.
    [87]刘学敏.水下机器人运动的S面控制方法[J].海洋工程, 2001, 19(3):81-84.
    [88]刘建成,于华男,徐玉如.水下机器人改进的S面控制方法[J].哈尔滨工程大学学报, 2002, 23(1):33-36.
    [89] Ijspeert A J. Central pattern generators for locomotion control in animals and robots: A review[J]. Neural Networks, 2008, 21:642-653.
    [90] Brauer E J, Jung R, Wilson D, et al. Analog circuit model of lamprey unit pattern generator[C]. Seventh Great Lakes Symposium on VLSL, 1997, 137-142.
    [91] Kimura H, Akiyama S, Sakurama K. Realization of dynamic walking and running of the quadruped using neural oscilator[J]. Autonomous Robots, 1997, (7):247-258.
    [92] Kimura H, Fukuoka Y. Biologically inspired adaptive dynamic walking in outdoor environment using a self-contained quadruped robot: Tekken2[C]. International Conference on Intelligent Robots and Systems, 2004, 986-991.
    [93] Endo G, Morimoto J, Nakanishi J, et al. An empirical exploration of a neural oscillator for biped locomotion control[C]. 2004 IEEE International Conference on Robotics & Automation, 2004, 3036-3042.
    [94] Jispeert A J, Crespi A, Cabelguen J M. Simulation and robotics studies of salamander locomotion[J]. Neuroinformatics, 2005, 3(3):171-196.
    [95]卢振利,马书根,李斌等.基于循环抑制CPG模型控制的蛇形机器人蜿蜒运动[J].自动化学报, 2006, 32(1):133-139.
    [96]卢振利,马书根,李斌等.蛇形机器人步态转换CPG控制器[J].中国科学E辑, 2007, 37(10):1304-1315.
    [97] Ijspeert A J, Crespi A, Cabelguen J-M. Simulation and robotics studies of salamander locomotion: Applying neurobiological principles to the control of locomotion in robots[J]. Neuroinformatics, 2005, 3(3):171-196.
    [98] Dickinson M H, Farley C T, Full R J, et al. How animals move: An integrative view[J]. SCIENCE, 2000, 288(7):100-106.
    [99] Colgate J E, Lynch K M. Control problems solved by a fish's body and brain: A review[J]. IEEE Journal of Oceanic Engineering, 2004, 39(3):660-673.
    [100] Haykin S.神经网络与机器学习(英文版,第3版)[M].北京:机械工业出版社, 2009.
    [101] Mitchell T M,曾华军等译.机器学习[M].北京:机械工业出版社, 2003.
    [102] Mussa-Ivaldi F A, Solla S A. Neural primitives for motion control[J]. IEEE Journal of Oceanic Engineering, 2004, 29(3):640-650.
    [103] Sutton R S. Temporal credit assignment in reinforcement learning[D]. Amherst, MA: University of Massachusetts, 1984.
    [104] Sutton R S. Learning to predict by the methods of temporal difference[J]. machine learning, 1998, 3:9-44.
    [105] Dayan P. The convergence of TD(r) for general r[J]. Machine Learning, 1992, 8:341-362.
    [106] Watkins C J C H. Learning from delayed rewards[D]. University of Cambridge, 1989.
    [107] Watkins C J C H, Dayan P. Q-learning[J]. Machine Learning, 1992, 8:279-292.
    [108]沈晶.分层强化学习理论与方法[M].哈尔滨:哈尔滨工程大学出版社, 2007.
    [109]阎平凡.再励学习——原理、算法及其在智能控制中的应用[J].信息与控制, 2006, 25(1):28-34.
    [110]马莉,蔡自兴.再励学习控制器结构与算法[J].模式识别与人工智能, 1998:96-100.
    [111]蒋国飞,吴沧浦.基于Q学习算法和BP神经网络的倒立摆控制[J].自动化学报, 1998, (5):662-666.
    [112]蒋国飞,吴沧浦. Q学习算法在库存控制中的应用[J].自动化学报, 1999, (2):236-241.
    [113]张健沛,王醒策,张汝波.连续动作强化学习及其在机器人中的应用研究[J].哈尔滨工程大学学报, 2000, 21(3):78-81.
    [114]杨广铭,张汝波,顾国昌.基于Q-learning的机器人避碰控制方法的研究[J].哈尔滨工程大学学报, 1999, 20(5):77-82.
    [115]张汝波,顾国昌,张国印.强化学习系统的结构及算法[J].计算机科学, 1999, 26(10):53-56.
    [116]张汝波,顾国昌,刘照德等.强化学习理论、算法及应用[J].控制理论与应用, 2000, 17(5):637-642.
    [117]张汝波.提高强化学习速度的方法研究[J].计算机工程与应用, 2001, 2001(22):38-40.
    [118]孙羽,张汝波,徐东.强化学习中资格迹的作用[J].计算机工程, 2002, 28(5):128-129.
    [119]徐昕.增强学习及其在移动机器人导航与控制中的应用研究[D].长沙:国防科技大学, 2002.
    [120]张汝波,施洋.基于模糊Q学习的多机器人系统研究[J].哈尔滨工程大学学报, 2005, 26(4):477-481.
    [121]李伟.工程应用中强化学习方法研究[D].上海:上海交通大学, 2005.
    [122]文锋.基于自适应评价者设计方法的学习控制研究[D].合肥:中国科技大学, 2005.
    [123]陈春林.基于强化学习的移动机器人自主学习及导航控制[D].合肥:中国科学技术大学, 2006.
    [124]王学宁.策略梯度增强学习的理论、算法及应用研究[D].长沙:国防科技大学, 2006.
    [125]高阳,周如益,王皓等.平均奖赏强化学习算法研究[J].计算机学报, 2007, 30(8):1372-1378.
    [126]陈宗海,杨志华,王海波等.从知识的表达和运用综述强化学习研究[J].控制与决策, 2008, 23(9):961-969.
    [127] Xu X, Liu C, Hu D. Continuous-action reinforcement learning with fast policy search and adaptive basis function selection[J]. Soft Computing, 2010.
    [128] Baird L C, Klopf A H. Reinforcement learning with high-dimensional, continuous actions[R]. Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.
    [129] Gaskett C, Wettergreen D, Zelinsky A. Q-learning in continuous state and action spaces[C]. Australian Joint Conference on Artificial Intelligence, 1999, 417-428.
    [130] Carreras M, Ridao P, El-Fakdi A. Semi-online neural-Q learning for real-time robot learning. IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, USA, 2003, 662-667.
    [131] Asadpour M, Siegwart R. Compact Q-learning for micro-robots with processing constraints[J]. Journal of Robotics and Autonomous Systems, 2004, 48(1):49-61.
    [132] Carreras M, Yuh J, Batlle J, et al. Application of SONQL for real-time learning of robot behaviors[J]. Robotics and Autonomous Systems, 2007, 55:628-642.
    [133] Lee Y, Hong S. Fuzzy Q-map algorithm for reinforcement learning[J]. Computational Intelligence and Security, 2007.
    [134]王雪松,田西兰,程玉虎.基于支持向量机的连续状态空间Q学习[J].中国矿业大学学报, 2008, 37(1):93-98.
    [135] Hartono P, Kakita S. Fast reinforcement learning for simple physical robots[J]. Memetic Comp, 2009, 1:305-313.
    [136] Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems[J]. IEEE Transactions on System, Man,and Cybernetics, 1983, SMC-13:834-846.
    [137] Konda V R, Tsitsiklis J N. On Actor-Critic algorithms[J]. SIAM J CONTROL OPTIM, 2003, 42(4):1143-1166.
    [138] Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning[J]. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4):41-77.
    [139] Botvinick M M, Niv Y, Barto A C. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective[J]. Cognition, 2009, 113:262-280.
    [140] Sutton R S, Precup D, Singh S P. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999, 112(1/2):181-211.
    [141] Parr R. Hierarchical control and learning for Markov Decision Processes[D]. Berkeley: University of California, 1998.
    [142] Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Articial Intelligence Research, 2000, 13:227-303.
    [143] Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8:229-256.
    [144] Baxter J, Peter L B. Infinite-horizon policy-gradient estimation[J]. Journal of Artificial Intelligence Research, 2001, 15:319-350.
    [145] Sutton R S, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation[C]. Advances in Neural Information Processing Systems, Denver, CO, USA, 2000, 1057-1063.
    [146] Amari S. Natural gradient works efficiently in learning[J]. Neural Computation, 1998, 10(2):251-276.
    [147] Ng A, Jordan M. A policy search method for large MDPs and POMDPs approximation[C]. 16th Conference on Uncertainty in Artificial Intelligence,San Francisco, 2000, 406-415.
    [148]沈志忠,曹志强,谭民等.基于增强式学习的仿生机器鱼避障控制[J].高技术通讯, 2006, 16(12):1253-1258.
    [149] El-Fakdi A, Carreras M, Ridao P. Towards direct policy search reinforcement learning for robot control[C]. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, 3178-3183.
    [150] Gaskett C. Q-learning for robot control[D]. Australian National University, 2002.
    [151] Perez M C. Behavior-based control architecute with reinforcement learning for an autonomous underwater robot[D]. Girona: University of Girona, 2003.
    [152] Breder C M. The locomotion of fishes[J]. Zoologica, 1926, 4(2):159-296.
    [153] Sfakiotakis M, Lane D M, Davies J B C. Development of a "Fin actuator" for the investigation of undulating fin propulsion. The first International Symposium on Aqua Bio-Mechanism, Honolulu, USA, 2000, 265-270.
    [154] Hu T, Shen L, Lin L, et al. Biological inspirations, kinematics modeling, mechanism design and experiments on an undulating robotic fin inspired by Gymnarchus Niloticus[J]. Mechanism and Machine Theory, 2009, 44(633-645).
    [155] Zhang Y-h, Jia L-b, Zhang S-w, et al. Computational research on modular undulating fin for biorobotic underwater propulsor[J]. Journal of Bionic Engineering, 2007, 4:25-32.
    [156] Low K H. Modelling and parametric study of modular undulating fin rays for fish robots[J]. Mechanism and Machine Theory, 2009, 44:615-632.
    [157] Sfakiotakis M, Lane D M, Davies J B C. Review of fish swimming modes for aquatic locomotion[J]. IEEE Journal of Oceanic Engineering, 1999, 24(2):237-252.
    [158]章永华.柔性仿生波动鳍的理论与实验研究[D].合肥:中国科技大学, 2008.
    [159]赛林霖,赛道建,尹玲等.鳍和鳔在鱼类沉浮行为中的作用[J].山东师范大学学报(自然科学版), 2006, 21(1):124-126.
    [160]裴建华.鱼的浮沉与鱼鳔的作用的另一种解释[J].物理教师, 2006, 27(6):30.
    [161]严卫生.鱼雷航行力学[M].西安:西北工业大学出版社, 2005.
    [162]侯巍,王树新,张海根.小型水下自航行器动力学建模与控制[J].天津大学学报, 2004, 37(9):769-773.
    [163] Yuh J. Design and control of autonomous underwater robots: A survey[J]. Autonomous Robots, 2000, 8:7-24.
    [164] Ven P v d, Flanagan C, Toal D, et al. Identification and control of underwater vehicles with the aid of neural networks[C]. IEEE Conference on Robotics, Automation and Mechatronics, 2004, 428-433.
    [165] Howard R A. Dynamic programming and Markov process[M]. Cambridge, MA, USA: MIT Press, 1960.
    [166] Blackwell D. Discrete dynamic programming[J]. Annals of Mathematical Statistics, 1962, 33:719-726.
    [167] Puterman M L. Markov Decision Processes[M]. New York: John Wilkey and Sons, 1994.
    [168]徐丽娜.神经网络控制(第三版)[M].北京:电子工业出版社, 2009.
    [169] Cichcsz P. An analysis of experience replay in temporal difference learning[J]. Cybernetics and Systems, 1999, 30:341-363.
    [170] Wawrzynski P, Pacut A. Truncated importance sampling for reinforcement learning with experience replay. International Multiconference on Computer Science and Information Technology; 2007. p. 305-315.
    [171] Koh T H, Lau M W S, Seet G, et al. A control module scheme for an underactuated underwater robotic vehicle[J]. J Intell Robot Syst, 2006, 46:43-58.
    [172]王雪松,程玉虎.机器学习理论、方法及应用[M].北京:科学出版社, 2009.
    [173]刘金琨.先进PID控制及其Matlab仿真[M].北京:电子工业出版社, 2003.
    [174] Nori F, Frezza R. Biologically inspired control of a kinematic chain using the superposition of motion primitives. 43rd IEEE Conference on Decision and Control, Atlantis, Paradise Island, Bahamas, 2004, 1075-1080.
    [175] Bottasso C L, Leonello D, Savini B. Path planning for autonomous vehicles by trajectory smoothing using motion primitives. AHS International Specialists’Meeting on Unmanned Rotorcraft, Phoenix, AZ, USA, 2007.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700