优化深度确定性策略梯度算法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

优化深度确定性策略梯度算法

详细信息查看全文 | 推荐本文 |

英文篇名：Optimized Deep Deterministic Policy Gradient Algorithm
作者：柯丰恺 ; 周唯倜 ; 赵大兴
英文作者：KE Fengkai;ZHOU Weiti;ZHAO Daxing;School of Mechanical Engineering, Hubei University of Technology;
关键词：强化学习 ; 深度学习 ; 连续动作控制 ; 机器臂
英文关键词：reinforcement learning;;deep learning;;continuous action control;;robot arm
中文刊名：JSGG
英文刊名：Computer Engineering and Applications
机构：湖北工业大学机械工程学院;
出版日期：2018-05-21 16:49
出版单位：计算机工程与应用
年：2019
期：v.55;No.926
基金：国家自然科学基金(No.51675166)
语种：中文;
页：JSGG201907024
页数：7
CN：07
分类号：156-161+238

摘要

深度强化学习善于解决控制的优化问题,连续动作的控制因为精度的要求,动作的数量随着动作维度的增加呈指数型增长,难以用离散的动作来表示。基于Actor-Critic框架的深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法虽然解决了连续动作控制问题,但是仍然存在采样方式缺乏科学理论指导、动作维度较高时的最优动作与非最优动作之间差距被忽视等问题。针对上述问题,提出一种基于DDPG算法的优化采样及精确评价的改进算法,并成功应用于选择顺应性装配机器臂(Selective Compliance Assembly Robot Arm,SCARA)的仿真环境中,与原始的DDPG算法对比,取得了良好的效果,实现了SCARA机器人快速自动定位。
Deep reinforcement learning is good at solving the optimization problems of control. Because of the accuracy requirements, with the increasing of action dimension, the number of action increases exponentially. So, it is difficult to express the continuous action with discrete action. The Deep Deterministic Policy Gradient(DDPG)algorithm, based on the Actor-Critic framework, solves the problem of continuous motion control. But there are still some problems, such as the lack of scientific theory of sampling, the neglect of the differences between optimal action and non-optimal action when the action dimension is relatively high. In order to solve these problems, this paper presents an improved algorithm with optimal sampling and precise critic for DDPG algorithm. And it is successfully applied to the simulation of Selective Compliance Assembly Robot Arm(SCARA). Compared with DDPG algorithm, an improvement effect is achieved and the SCARA robot is quickly and automatically positioned.

引文

[1]Silver D,Huang A,Maddison C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529:484.
    [2]Hinton G,Salakhutdinov R.Reducing the dimensionality of data with neural networks[J].Science,2006,313:504-507.
    [3]Krizhevsky A,Sutskever I,Hinton G E.ImageNet classification with deep convolutional neural networks[C]//Proceedings of International Conference on Neural Information Processing Systems,2012:1097-1105.
    [4]Ren S,Girshick R.Faster R-CNN:Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2017,39(6):1137-1149.
    [5]Hinton G,Deng L,Yu D,et al.Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J].IEEE Signal Processing Magazine,2012,29(6):82-97.
    [6]Graves A,Mohamed A R,Hinton G.Speech recognition with deep recurrent neural networks[C]//Proceedings of International Conference on Acoustics,Speech,and Signal Processing,2013:6645-6649.
    [7]Wu Y,Schuster M,Chen Z,et al.Google’s neural machine translation system:Bridging the gap between human and machine translation[J].arXiv:1609.08144,2016.
    [8]Tai K S,Socher R,Manning C D.Improved semantic representations from tree-structured long short-term memory networks[J].Computer Science,2015,5(1):36.
    [9]刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27.
    [10]Mnih V,Kavukcuoglu K,Silver D,et al.Playing Atari with deep reinforcement learning[C]//Proceedings of NIPSDeep Learning Workshop 2013,2013.
    [11]Hasselt H V,Guez A,Silver D.Deep reinforcement learning with double q-learning[J].arXiv:1509.06461,2015.
    [12]Wang Z Y,Schaul T,Hessel M,et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of ICML’16,2016.
    [13]Sutton R S,Mcallester D,Singh S,et al.Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems,1999:1057-1063.
    [14]Konda V R,Tsitsiklis J N.Actor-critic algorithms[C]//Advances in Neural Information Processing Systems,2000:1008-1014.
    [15]Silver D,Lever G,Heess N,et al.Deterministic policy gradient algorithms[C]//Proceedings of International Conference on Machine Learning,2014:387-395.
    [16]Lillicrap T P,Hunt J J,Pritzel A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):187.
    [17]Schaul T,Quan J,Antonoglou I,et al.Prioritized experience replay[C]//Proceedings of ICLR’16,2016.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700