Continuous Control with Contexts, Provably
Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

TL;DR
This paper introduces a provably efficient algorithm for building decoders in continuous control tasks like LQR, enabling agents to adapt to unseen environments with theoretical guarantees and supporting experiments.
Contribution
It presents the first provably efficient algorithm for decoder construction in continuous control, combining UCB-based exploration with theoretical regret bounds.
Findings
Algorithm achieves rom or online environment adaptation.
Agent can transfer learned knowledge to unseen environments after ewer than 1/or uture environments.
Experimental results validate the effectiveness of the proposed method.
Abstract
A fundamental challenge in artificial intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes the context of the unseen new environment as input and generates a policy accordingly. The current paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a regret bound in the online setting where is the number of environments the agent played. This also implies after playing environments, the agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
