Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: Generalized Baselines
Benjamin Schiffer, Lucas Janson

TL;DR
This paper develops a theoretical framework for safe online reinforcement learning in Linear Quadratic Regulators, demonstrating that nonlinear controllers can achieve sublinear regret under safety constraints, with implications for more complex systems.
Contribution
It introduces a general framework for analyzing nonlinear controllers in safety-constrained RL, establishing regret bounds and a new uncertainty estimation method for such controllers.
Findings
Nonlinear controllers can achieve O_T(\u007F \, rac{ T}{2}) regret under safety constraints.
Safety enforcement with sufficient noise enables 'free exploration' in constrained RL.
Framework potentially extends to higher-dimensional systems beyond 1D.
Abstract
Many practical applications of online reinforcement learning require the satisfaction of safety constraints while learning about the unknown environment. In this work, we establish theoretical foundations for reinforcement learning with safety constraints by studying the canonical problem of Linear Quadratic Regulator learning with unknown dynamics, but with the additional constraint that the position must stay within a safe region for the entire trajectory with high probability. Our primary contribution is a general framework for studying stronger baselines of nonlinear controllers that are better suited for constrained problems than linear controllers. Due to the difficulty of analyzing non-linear controllers in a constrained problem, we focus on 1-dimensional state- and action- spaces, however we also discuss how we expect the high-level takeaways can generalize to higher dimensions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Smart Grid Security and Resilience
MethodsFocus
