A generalized stacked reinforcement learning method for sampled systems
Pavel Osinenko, Dmitrii Dobriborsci, Grigory Yaremenko, Georgiy, Malaniya

TL;DR
This paper introduces and benchmarks two reinforcement learning methods tailored for sampled systems, combining model-predictive control with critic learning to improve performance in discrete-time environments.
Contribution
The paper proposes a hybrid RL approach integrating MPC with critic learning for sampled systems, addressing the gap between continuous-time physical systems and digital RL methods.
Findings
Hybrid RL methods outperform traditional approaches in sampled system environments.
The proposed methods demonstrate improved control performance in a mobile robot case study.
Optimality analysis confirms the effectiveness of the hybrid approach.
Abstract
A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
