TL;DR
This paper introduces a trajectory-optimization-based exploration algorithm for unknown MDPs that improves sample efficiency and model fidelity by directed exploration, outperforming intrinsic motivation methods and maintaining computational efficiency.
Contribution
It presents a novel approximate solution for optimal exploration in unknown MDPs using Bayesian experimental design and trajectory optimization, without prior knowledge of the environment.
Findings
Faster convergence and higher model fidelity compared to intrinsic motivation algorithms.
Maintains computational efficiency over recent model-based active exploration methods.
Effective directed exploration improves sample efficiency in unknown MDPs.
Abstract
Sample-efficient exploration is crucial not only for discovering rewarding experiences but also for adapting to environment changes in a task-agnostic fashion. A principled treatment of the problem of optimal input synthesis for system identification is provided within the framework of sequential Bayesian experimental design. In this paper, we present an effective trajectory-optimization-based approximate solution of this otherwise intractable problem that models optimal exploration in an unknown Markov decision process (MDP). By interleaving episodic exploration with Bayesian nonlinear system identification, our algorithm takes advantage of the inductive bias to explore in a directed manner, without assuming prior knowledge of the MDP. Empirical evaluations indicate a clear advantage of the proposed algorithm in terms of the rate of convergence and the final model fidelity when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
