Optimal Exploration for Model-Based RL in Nonlinear Systems
Andrew Wagenmaker, Guanya Shi, Kevin Jamieson

TL;DR
This paper introduces a method for optimal exploration in nonlinear systems that focuses on learning the most relevant parameters for control, leading to near-optimal policy performance with efficient data collection.
Contribution
It develops a task-dependent exploration algorithm for nonlinear systems, linking policy loss to parameter estimation in a specific metric, and provides theoretical guarantees.
Findings
Algorithm efficiently reduces uncertainty in critical parameters.
Proves near-instance-optimal learning rate for controllers.
Demonstrates effectiveness in nonlinear robotic systems.
Abstract
Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system~\citep{mania2020active}, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReceptor Mechanisms and Signaling · Advanced Control Systems Optimization · Machine Learning and Algorithms
