The Confusing Instance Principle for Online Linear Quadratic Control
Waris Radji (Scool, CRIStAL), Odalric-Ambrym Maillard (Scool, CRIStAL)

TL;DR
This paper introduces MED-LQ, a novel control strategy for linear quadratic regulation that leverages the Confusing Instance principle, demonstrating competitive performance and potential for large-scale MDPs.
Contribution
It extends the Confusing Instance and Minimum Empirical Divergence principles to online LQR control, providing a new approach with theoretical and practical advantages.
Findings
MED-LQ achieves competitive performance across various benchmarks.
The approach demonstrates potential scalability to large-scale MDPs.
It offers a new theoretical framework for model-based reinforcement learning in control.
Abstract
We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning. Traditional methods like Optimism in the Face of Uncertainty and Thompson Sampling, rooted in multi-armed bandits (MABs), face practical limitations. In contrast, we propose an alternative based on the Confusing Instance (CI) principle, which underpins regret lower bounds in MABs and discrete Markov Decision Processes (MDPs) and is central to the Minimum Empirical Divergence (MED) family of algorithms, known for their asymptotic optimality in various settings. By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop MED-LQ. This novel control strategy extends the principles of CI and MED beyond small-scale settings. Our benchmarks on a comprehensive control suite demonstrate that MED-LQ achieves competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
