Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories
Mandy Xie, Anqi Li, Karl Van Wyk, Frank Dellaert, Byron Boots, Nathan, Ratliff

TL;DR
This paper introduces CoDE, a novel imitation learning method that effectively learns policies from fixed offline demonstrations by optimizing policies and auxiliary trajectories simultaneously, improving accuracy and efficiency.
Contribution
The paper proposes CoDE, a new IL technique that uses an auxiliary trajectory network inspired by optimal control collocation, enabling better learning from offline data without expert interaction.
Findings
CoDE outperforms behavioral cloning with fewer demonstrations.
The method accurately reproduces complex robotic behaviors.
Simulation results show successful learning of manipulation tasks.
Abstract
Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Path Planning Algorithms
