Near-optimal Policy Identification in Active Reinforcement Learning

Xiang Li; Viraj Mehta; Johannes Kirschner; Ian Char; Willie; Neiswanger; Jeff Schneider; Andreas Krause; Ilija Bogunovic

arXiv:2212.09510·stat.ML·December 20, 2022

Near-optimal Policy Identification in Active Reinforcement Learning

Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie, Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic

PDF

Open Access 1 Video

TL;DR

This paper introduces AE-LSVI, a novel algorithm for near-optimal policy identification in active reinforcement learning with generative models, providing polynomial sample complexity guarantees independent of state space size.

Contribution

The paper proposes AE-LSVI, a new kernelized LSVI variant that combines optimism and pessimism for active exploration, with proven uniform near-optimal policy identification and improved sample complexity.

Findings

01

AE-LSVI outperforms other RL algorithms in robustness tests.

02

Achieves polynomial sample complexity independent of state space size.

03

Improves bounds in offline contextual Bayesian optimization.

Abstract

Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-optimal Policy Identification in Active Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms