Imitation Learning via Simultaneous Optimization of Policies and   Auxiliary Trajectories

Mandy Xie; Anqi Li; Karl Van Wyk; Frank Dellaert; Byron Boots; Nathan; Ratliff

arXiv:2105.03019·cs.RO·June 8, 2021

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Mandy Xie, Anqi Li, Karl Van Wyk, Frank Dellaert, Byron Boots, Nathan, Ratliff

PDF

Open Access

TL;DR

This paper introduces CoDE, a novel imitation learning method that effectively learns policies from fixed offline demonstrations by optimizing policies and auxiliary trajectories simultaneously, improving accuracy and efficiency.

Contribution

The paper proposes CoDE, a new IL technique that uses an auxiliary trajectory network inspired by optimal control collocation, enabling better learning from offline data without expert interaction.

Findings

01

CoDE outperforms behavioral cloning with fewer demonstrations.

02

The method accurately reproduces complex robotic behaviors.

03

Simulation results show successful learning of manipulation tasks.

Abstract

Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Path Planning Algorithms