Not Only Domain Randomization: Universal Policy with Embedding System Identification
Zihan Ding

TL;DR
This paper introduces UPESI, a novel adaptive control method combining universal policies with embedding system identification, outperforming domain randomization and traditional SI in diverse simulation environments.
Contribution
It proposes conducting system identification in an embedding space using a learned dynamics model and Bayesian optimization, enabling adaptive universal policies.
Findings
Outperforms domain randomization in various tasks
Effective in both low- and high-dimensional environments
Demonstrates superior adaptability and efficiency
Abstract
Domain randomization (DR) cannot provide optimal policies for adapting the learning agent to the dynamics of the environment, although it can generalize sub-optimal policies to work in a transferred domain. In this paper, we present Universal Policy with Embedding System Identification (UPESI) as an implicit system identification (SI) approach with universal policies (UPs), as a learning-based control method to execute optimal actions adaptively in environments with various dynamic properties. Previous approaches of SI for adaptive policies either conduct explicit SI, which is testified to be an ill-posed problem, or suffer from low efficiency without leveraging the simulation oracle. We propose to conduct SI in the embedding space of system dynamics by leveraging a learned forward dynamics model, and use Bayesian optimization for the SI process given transition data in a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Oil and Gas Production Techniques
