Reinforcement Learning with Fast Stabilization in Linear Dynamical Systems
Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar

TL;DR
This paper introduces a model-based reinforcement learning algorithm for stabilizable linear dynamical systems that achieves fast stabilization and significantly improved regret bounds through an innovative exploration strategy.
Contribution
The paper presents a novel exploration method combining sophisticated policies with isotropic exploration, enabling fast stabilization and exponential regret improvement in linear systems.
Findings
Achieves ( ilde{ ext{O}}(\sqrt{T})) regret after T steps.
Regret has polynomial dependence on problem dimensions, an exponential improvement.
Empirical results show superior performance in adaptive control tasks.
Abstract
In this work, we study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems. When learning a dynamical system, one needs to stabilize the unknown dynamics in order to avoid system blow-ups. We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment with an improved exploration strategy. We show that the proposed algorithm attains regret after time steps of agent-environment interaction. We also show that the regret of the proposed algorithm has only a polynomial dependence in the problem dimensions, which gives an exponential improvement over the prior methods. Our improved exploration method is simple, yet efficient, and it combines a sophisticated exploration policy in RL with an isotropic exploration strategy to achieve fast stabilization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
