Not Only Domain Randomization: Universal Policy with Embedding System   Identification

Zihan Ding

arXiv:2109.13438·cs.RO·September 29, 2021·1 cites

Not Only Domain Randomization: Universal Policy with Embedding System Identification

Zihan Ding

PDF

Open Access

TL;DR

This paper introduces UPESI, a novel adaptive control method combining universal policies with embedding system identification, outperforming domain randomization and traditional SI in diverse simulation environments.

Contribution

It proposes conducting system identification in an embedding space using a learned dynamics model and Bayesian optimization, enabling adaptive universal policies.

Findings

01

Outperforms domain randomization in various tasks

02

Effective in both low- and high-dimensional environments

03

Demonstrates superior adaptability and efficiency

Abstract

Domain randomization (DR) cannot provide optimal policies for adapting the learning agent to the dynamics of the environment, although it can generalize sub-optimal policies to work in a transferred domain. In this paper, we present Universal Policy with Embedding System Identification (UPESI) as an implicit system identification (SI) approach with universal policies (UPs), as a learning-based control method to execute optimal actions adaptively in environments with various dynamic properties. Previous approaches of SI for adaptive policies either conduct explicit SI, which is testified to be an ill-posed problem, or suffer from low efficiency without leveraging the simulation oracle. We propose to conduct SI in the embedding space of system dynamics by leveraging a learned forward dynamics model, and use Bayesian optimization for the SI process given transition data in a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Oil and Gas Production Techniques