Foundations of Safe Online Reinforcement Learning in the Linear   Quadratic Regulator: Generalized Baselines

Benjamin Schiffer; Lucas Janson

arXiv:2410.21081·stat.ML·April 30, 2025

Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: Generalized Baselines

Benjamin Schiffer, Lucas Janson

PDF

Open Access

TL;DR

This paper develops a theoretical framework for safe online reinforcement learning in Linear Quadratic Regulators, demonstrating that nonlinear controllers can achieve sublinear regret under safety constraints, with implications for more complex systems.

Contribution

It introduces a general framework for analyzing nonlinear controllers in safety-constrained RL, establishing regret bounds and a new uncertainty estimation method for such controllers.

Findings

01

Nonlinear controllers can achieve O_T(\u007F \, rac{ T}{2}) regret under safety constraints.

02

Safety enforcement with sufficient noise enables 'free exploration' in constrained RL.

03

Framework potentially extends to higher-dimensional systems beyond 1D.

Abstract

Many practical applications of online reinforcement learning require the satisfaction of safety constraints while learning about the unknown environment. In this work, we establish theoretical foundations for reinforcement learning with safety constraints by studying the canonical problem of Linear Quadratic Regulator learning with unknown dynamics, but with the additional constraint that the position must stay within a safe region for the entire trajectory with high probability. Our primary contribution is a general framework for studying stronger baselines of nonlinear controllers that are better suited for constrained problems than linear controllers. Due to the difficulty of analyzing non-linear controllers in a constrained problem, we focus on 1-dimensional state- and action- spaces, however we also discuss how we expect the high-level takeaways can generalize to higher dimensions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Smart Grid Security and Resilience

MethodsFocus