A stabilizing reinforcement learning approach for sampled systems with partially unknown models
Lukas Beckenbach, Pavel Osinenko, Stefan Streif

TL;DR
This paper introduces a reinforcement learning method that guarantees practical stability for sampled systems with partially unknown models by integrating classical adaptive control techniques in an online setting, demonstrated in vehicle control tasks.
Contribution
It presents a novel online reinforcement learning approach that ensures system stability with partial model knowledge, combining classical adaptive control with modern RL in a sampled data framework.
Findings
Significantly reduces control cost in adaptive traction and cruise control.
Guarantees practical stability in partially unknown systems.
Effective in a digital, sampled control environment.
Abstract
Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Adaptive Dynamic Programming Control · Cardiovascular Function and Risk Factors
