Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems
Bo Pang, Zhong-Ping Jiang

TL;DR
This paper introduces a new reinforcement learning algorithm for adaptive optimal control of linear stochastic systems, which converges to near-optimal policies directly from data without system identification.
Contribution
A novel off-policy reinforcement learning algorithm based on optimistic least-squares policy iteration for continuous-time stochastic systems is proposed, enabling direct data-driven control policy optimization.
Findings
Algorithm converges to a neighborhood of the optimal solution with probability one.
Validated on a triple inverted pendulum example demonstrating feasibility.
Effective in handling systems with both additive and multiplicative noises.
Abstract
This paper studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Smart Grid Energy Management · Energy Load and Power Forecasting
