Balancing Exploration for Online Receding Horizon Learning Control with Provable Regret Guarantees
Deepan Muthirayan, Jianjun Yuan, Pramod P. Khargonekar

TL;DR
This paper introduces a novel online receding horizon control algorithm that balances exploration and exploitation in unknown linear systems, achieving provable sub-linear regret and constraint satisfaction guarantees.
Contribution
It proposes a new exploration method for receding horizon control that ensures persistent excitation and sub-linear regret bounds in unknown linear systems.
Findings
Regret bounded by O(T^{3/4})
Ensures persistent excitation for exploration
Balances exploration and exploitation effectively
Abstract
We address the problem of simultaneously learning and control in an online receding horizon control setting. We consider the control of an unknown linear dynamical system with general cost functions and affine constraints on the control input. Our goal is to develop an online learning algorithm that minimizes the dynamic regret, which is defined as the difference between the cumulative cost incurred by the algorithm and that of the best policy with full knowledge of the system, cost functions and state and that satisfies the control input constraints. We propose a novel approach to explore in an online receding horizon setting. The key challenge is to ensure that the control generated by the receding horizon controller is persistently exciting. Our approach is to apply a perturbation to the control input generated by the receding horizon controller that balances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Control Systems Optimization
