用户名: 密码: 验证码:
Off-policy Reinforcement Learning for Robust Control of Discrete-time Uncertain Linear Systems
详细信息    查看官网全文
摘要
In this paper, an off-policy reinforcement learning method is developed for the robust stabilizing controller design of discrete-time uncertain linear systems. The proposed robust control design consists of two steps. First, the robust control problem is transformed to an optimal control problem. Second, the off-policy RL method is used to design the optimal control policy which guarantees the robust stability of the original system with uncertainty. The condition for the equivalence between the robust control problem and the optimal control problem is discussed. The off-policy does not require any knowledge of the system knowledge and efficiently utilize the data collected from on-line to improve the performance of approximate optimal control policy in each iteration successively. Finally, a simulation example is carried out to verify the effectiveness of the presented algorithm for the robust control problem of discrete-time linear system with uncertainty.
In this paper, an off-policy reinforcement learning method is developed for the robust stabilizing controller design of discrete-time uncertain linear systems. The proposed robust control design consists of two steps. First, the robust control problem is transformed to an optimal control problem. Second, the off-policy RL method is used to design the optimal control policy which guarantees the robust stability of the original system with uncertainty. The condition for the equivalence between the robust control problem and the optimal control problem is discussed. The off-policy does not require any knowledge of the system knowledge and efficiently utilize the data collected from on-line to improve the performance of approximate optimal control policy in each iteration successively. Finally, a simulation example is carried out to verify the effectiveness of the presented algorithm for the robust control problem of discrete-time linear system with uncertainty.
引文
[1]J.Cruz,J.Freudenberg,and D.Looze,“A relationship between sensitivity and stability of multivariable feedback systems,”IEEE Transactions on Automatic Control,vol.26,no.1,pp.66-74,1981.
    [2]F.Lin,R.D.Brandt,and J.Sun,“Robust control of nonlinear systems:compensating for uncertainty,”International Journal of Control,vol.56,no.6,pp.1453-1459,1992.
    [3]B.Chen and C.Wong,“Robust linear controller design:time domain approach,”IEEE transactions on automatic control,vol.32,no.2,pp.161-164,1987.
    [4]H.Ma,Z.Wang,D.Wang,D.Liu,P.Yan,and Q.Wei,“Neural-network-based distributed adaptive robust control for a class of nonlinear multiagent systems with time delays and external noises,”IEEE Transactions on Systems,Man,and Cybernetics:Systems,vol.46,no.6,pp.750-758,2016.
    [5]D.Tong,Q.Zhu,W.Zhou,Y.Xu,and J.Fang,“Adaptive synchronization for stochastic t-s fuzzy neural networks with time-delay and markovian jumping parameters,”Neurocomputing,vol.117,pp.91-97,2013.
    [6]D.Tong,W.Zhou,X.Zhou,J.Yang,L.Zhang,and Y.Xu,“Exponential synchronization for stochastic neural networks with multi-delayed and markovian switching via adaptive feedback control,”Communications in Nonlinear Science and Numerical Simulation,vol.29,no.1,pp.359-371,2015.
    [7]D.Tong,L.Zhang,W.Zhou,J.Zhou,and Y.Xu,“Asymptotical synchronization for delayed stochastic neural networks with uncertainty via adaptive control,”International Journal of Control,Automation and Systems,vol.14,no.3,pp.706-712,2016.
    [8]F.Lin,Robust control design:an optimal control approach.John Wiley&Sons,2007,vol.18.
    [9]D.Liu,D.Wang,F.-Y.Wang,H.Li,and X.Yang,“Neuralnetwork-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems,”IEEE transactions on cybernetics,vol.44,no.12,pp.2834-2847,2014.
    [10]X.Yang,D.Liu,Q.Wei,and D.Wang,“Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming,”Neurocomputing,vol.198,pp.80-90,2016.
    [11]D.Liu,X.Yang,D.Wang,and Q.Wei,“Reinforcementlearning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints,”IEEE transactions on cybernetics,vol.45,no.7,pp.1372-1385,2015.
    [12]D.Wang,D.Liu,H.Li,and H.Ma,“Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming,”Information Sciences,vol.282,pp.167-179,2014.
    [13]D.Wang,C.Li,D.Liu,and C.Mu,“Data-based robust optimal control of continuous-time affine nonlinear systems with matched uncertainties,”Information Sciences,vol.366,pp.121-133,2016.
    [14]P.J.Werbos,“Approximate dynamic programming for realtime control and neural modeling,”Handbook of intelligent control:Neural,fuzzy,and adaptive approaches,vol.15,pp.493-525,1992.
    [15]D.Kleinman,“On an iterative technique for riccati equation computations,”IEEE Transactions on Automatic Control,vol.13,no.1,pp.114-115,1968.
    [16]G.N.Saridis and C.-S.G.Lee,“An approximation theory of optimal control for trainable manipulators,”IEEE Transactions on systems,Man,and Cybernetics,vol.9,no.3,pp.152-159,1979.
    [17]M.Abu-Khalaf and F.L.Lewis,“Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach,”Automatica,vol.41,no.5,pp.779-791,2005.
    [18]D.Vrabie and F.Lewis,“Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems,”Neural Networks,vol.22,no.3,pp.237-246,2009.
    [19]D.Vrabie,O.Pastravanu,M.Abu-Khalaf,and F.Lewis,“Adaptive optimal control for continuous-time linear systems based on policy iteration,”Automatica,vol.45,no.2,pp.477-484,2009.
    [20]F.Silvia and R.F.Stengel,“Model-based adaptive critic designs,”Handbook of learning and approximate dynamic programming,vol.2,p.65,2004.
    [21]D.Wang,D.Liu,Q.Wei,D.Zhao,and N.Jin,“Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,”Automatica,vol.48,no.8,pp.1825-1832,2012.
    [22]Y.Jiang and Z.Jiang,“Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,”Automatica,vol.48,no.10,pp.2699-2704,2012.
    [23]B.Luo,H.Wu,and T.Huang,“Off-policy reinforcement learning for control design,”IEEE transactions on cybernetics,vol.45,no.1,pp.65-76,2015.
    [24]B.Kiumarsi,F.L.Lewis,and Z.-P.Jiang,“H control of linear discrete-time systems:Off-policy reinforcement learning,”Automatica,vol.78,pp.144-152,2017.
    [25]H.Modares,F.L.Lewis,and Z.-P.Jiang,“Tracking control of completely unknown continuous-time systems via off-policy reinforcement learning,”IEEE transactions on neural networks and learning systems,vol.26,no.10,pp.2550-2562,2015.
    [26]H.Modares,S.P.Nageshrao,G.A.D.Lopes,R.Babuˇska,and F.L.Lewis,“Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning,”Automatica,vol.71,pp.334-341,2016.
    [27]F.L.Lewis and V.L.Syrmos,Optimal control.John Wiley&Sons,1995.
    [28]R.A.Horn and C.R.Johnson,Matrix analysis.Cambridge university press,2012.
    [29]N.S.Tripathy,I.N.Kar,and K.Paul,“Stabilization of uncertain discrete-time linear system with limited communication,”IEEE Transactions on Automatic Control,2017,in Press.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700