A Tour of Reinforcement Learning: The View from Continuous Control
Benjamin Recht

TL;DR
This survey explores reinforcement learning for continuous control, comparing methods through theoretical insights and a case study on LQR, highlighting the importance of models and challenges in safe, reliable learning.
Contribution
It provides a comprehensive overview of reinforcement learning in continuous control, integrating learning theory and control techniques, with a detailed case study on LQR performance analysis.
Findings
Non-asymptotic performance characterizations match experimental results.
Models and generality significantly impact reinforcement learning effectiveness.
Challenges remain in designing safe, reliable learning systems for complex environments.
Abstract
This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best-studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Advanced Bandit Algorithms Research
