The Confusing Instance Principle for Online Linear Quadratic Control

Waris Radji (Scool; CRIStAL); Odalric-Ambrym Maillard (Scool; CRIStAL)

arXiv:2510.19531·cs.LG·October 23, 2025

The Confusing Instance Principle for Online Linear Quadratic Control

Waris Radji (Scool, CRIStAL), Odalric-Ambrym Maillard (Scool, CRIStAL)

PDF

Open Access

TL;DR

This paper introduces MED-LQ, a novel control strategy for linear quadratic regulation that leverages the Confusing Instance principle, demonstrating competitive performance and potential for large-scale MDPs.

Contribution

It extends the Confusing Instance and Minimum Empirical Divergence principles to online LQR control, providing a new approach with theoretical and practical advantages.

Findings

01

MED-LQ achieves competitive performance across various benchmarks.

02

The approach demonstrates potential scalability to large-scale MDPs.

03

It offers a new theoretical framework for model-based reinforcement learning in control.

Abstract

We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning. Traditional methods like Optimism in the Face of Uncertainty and Thompson Sampling, rooted in multi-armed bandits (MABs), face practical limitations. In contrast, we propose an alternative based on the Confusing Instance (CI) principle, which underpins regret lower bounds in MABs and discrete Markov Decision Processes (MDPs) and is central to the Minimum Empirical Divergence (MED) family of algorithms, known for their asymptotic optimality in various settings. By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop MED-LQ. This novel control strategy extends the principles of CI and MED beyond small-scale settings. Our benchmarks on a comprehensive control suite demonstrate that MED-LQ achieves competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control