ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical   Systems

James Harrison; Animesh Garg; Boris Ivanovic; Yuke Zhu; Silvio; Savarese; Li Fei-Fei; Marco Pavone

arXiv:1707.04674·cs.RO·November 10, 2017

ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

James Harrison, Animesh Garg, Boris Ivanovic, Yuke Zhu, Silvio, Savarese, Li Fei-Fei, Marco Pavone

PDF

TL;DR

ADAPT is a novel algorithm enabling safe, robust, zero-shot transfer of reinforcement learning policies from simulation to real-world stochastic systems, addressing model mismatch and safety concerns.

Contribution

The paper introduces ADAPT, combining offline policy learning with online tube-based MPC to achieve provably safe, adaptive policy transfer without fine-tuning.

Findings

01

ADAPT outperforms direct transfer by 50%-300% in mean reward.

02

ADAPT guarantees safety via state-action tubes under Lipschitz continuity.

03

The method is validated on two simulated non-holonomic systems with various disturbances.

Abstract

Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem of policy transfer from simulated to the physical system. Model mismatch due to systematic parameter shift and unmodelled dynamics error may cause sub-optimal or unsafe behavior upon direct transfer. We introduce the Adaptive Policy Transfer for Stochastic Dynamics (ADAPT) algorithm that achieves provably safe and robust, dynamically-feasible zero-shot transfer of RL-policies to new domains with dynamics error. ADAPT combines the strengths of offline policy learning in a black-box source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.