Average Cost Optimal Control of Stochastic Systems Using Reinforcement Learning
Jing Lai, Junlin Xiong

TL;DR
This paper develops an online reinforcement learning method for optimal control of stochastic systems aiming to minimize average costs, with proven convergence to optimal solutions and demonstrated effectiveness through a numerical example.
Contribution
It introduces a novel online, model-free RL algorithm that estimates the Q-function kernel and control gain for stochastic systems with convergence guarantees.
Findings
Convergence of control gain and kernel matrix to optimal values
Effective online learning algorithm demonstrated via numerical example
Applicable to systems with multiplicative and additive noise
Abstract
This paper addresses the average cost minimization problem for discrete-time systems with multiplicative and additive noises via reinforcement learning. By using Q-function, we propose an online learning scheme to estimate the kernel matrix of Q-function and to update the control gain using the data along the system trajectories. The obtained control gain and kernel matrix are proved to converge to the optimal ones. To implement the proposed learning scheme, an online model-free reinforcement learning algorithm is given, where recursive least squares method is used to estimate the kernel matrix of Q-function. A numerical example is presented to illustrate the proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Control Systems Optimization
