Online Adaptive Optimal Control Algorithm Based on Synchronous Integral Reinforcement Learning With Explorations
Lei Guo, Han Zhao

TL;DR
This paper introduces a model-free, synchronous integral reinforcement learning algorithm for continuous-time optimal control, utilizing neural networks and excitation signals to solve Hamilton-Jacobi-Bellman equations without prior system knowledge.
Contribution
The paper proposes a novel synchronous integral Q-learning algorithm that relaxes initial policy restrictions and guarantees convergence using excitation conditions.
Findings
Algorithm successfully solves continuous-time optimal control problems.
Neural networks accurately approximate value functions and policies.
Numerical simulations confirm effectiveness and convergence.
Abstract
In this paper, we present a novel algorithm named synchronous integral Q-learning, which is based on synchronous policy iteration, to solve the continuous-time infinite horizon optimal control problems of input-affine system dynamics. The integral reinforcement is measured as an excitation signal in this method to estimate the solution to the Hamilton-Jacobi-Bellman equation. Moreover, the proposed method is completely model-free, i.e. no a priori knowledge of the system is required. Using policy iteration, the actor and critic neural networks can simultaneously approximate the optimal value function and policy. The persistence of excitation condition is required to guarantee the convergence of the two networks. Unlike in traditional policy iteration algorithms, the restriction of the initial admissible policy is relaxed in this method. The effectiveness of the proposed algorithm is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Frequency Control in Power Systems
