Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time Systems
Junya Ikemoto, Toshimitsu Ushio

TL;DR
This paper introduces a two-stage reinforcement learning method using simulators with multiple system models to stabilize uncertain discrete-time systems, improving learning efficiency and robustness.
Contribution
The proposed approach combines multiple virtual system models with continuous deep Q-learning to enhance policy stability under model uncertainties.
Findings
Effective stabilization demonstrated in numerical simulations.
Improved learning efficiency over traditional RL methods.
Robustness against identification errors in system parameters.
Abstract
Applications of reinforcement learning (RL) to stabilization problems of real systems are restricted since an agent needs many experiences to learn an optimal policy and may determine dangerous actions during its exploration. If we know a mathematical model of a real system, a simulator is useful because it predicates behaviors of the real system using the mathematical model with a given system parameter vector. We can collect many experiences more efficiently than interactions with the real system. However, it is difficult to identify the system parameter vector accurately. If we have an identification error, experiences obtained by the simulator may degrade the performance of the learned policy. Thus, we propose a practical RL algorithm that consists of two stages. At the first stage, we choose multiple system parameter vectors. Then, we have a mathematical model for each system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Control Systems Optimization
MethodsQ-Learning
