Continuous Control with Contexts, Provably

Simon S. Du; Ruosong Wang; Mengdi Wang; Lin F. Yang

arXiv:1910.13614·cs.LG·October 31, 2019·1 cites

Continuous Control with Contexts, Provably

Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

PDF

Open Access

TL;DR

This paper introduces a provably efficient algorithm for building decoders in continuous control tasks like LQR, enabling agents to adapt to unseen environments with theoretical guarantees and supporting experiments.

Contribution

It presents the first provably efficient algorithm for decoder construction in continuous control, combining UCB-based exploration with theoretical regret bounds.

Findings

01

Algorithm achieves rom or online environment adaptation.

02

Agent can transfer learned knowledge to unseen environments after ewer than 1/or uture environments.

03

Experimental results validate the effectiveness of the proposed method.

Abstract

A fundamental challenge in artificial intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes the context of the unseen new environment as input and generates a policy accordingly. The current paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a $O (T)$ regret bound in the online setting where $T$ is the number of environments the agent played. This also implies after playing $O (1/ ϵ^{2})$ environments, the agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms