Optimistic Active Exploration of Dynamical Systems

Bhavya Sukhija; Lenart Treven; Cansu Sancaktar; Sebastian Blaes,; Stelian Coros; Andreas Krause

arXiv:2306.12371·cs.LG·October 31, 2023·1 cites

Optimistic Active Exploration of Dynamical Systems

Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes,, Stelian Coros, Andreas Krause

PDF

Open Access 1 Video

TL;DR

This paper introduces OPAX, an active exploration algorithm for unknown dynamical systems that uses probabilistic models to efficiently learn dynamics and enable zero-shot multi-task planning.

Contribution

The paper develops OPAX, a novel active exploration method that optimizes information gain using probabilistic models, with theoretical analysis and practical evaluation.

Findings

01

OPAX effectively reduces epistemic uncertainty in unknown dynamics.

02

OPAX achieves zero-shot planning performance on downstream tasks.

03

Theoretical sample complexity bounds are established for Gaussian process dynamics.

Abstract

Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by developing an algorithm -- OPAX -- for active exploration. OPAX uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t. to plausible dynamics -- maximizes the information gain between the unknown dynamics and state observations. We show how the resulting optimization problem can be reduced to an optimal control problem that can be solved at each episode using standard approaches. We analyze our algorithm for general models, and, in the case of Gaussian process dynamics, we give a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimistic Active Exploration of Dynamical Systems· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research

MethodsGaussian Process