TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
Mengjiao Yang, Sergey Levine, Ofir Nachum

TL;DR
TRAIL introduces a method to leverage suboptimal offline data to learn a latent action space, significantly improving sample efficiency and performance in imitation learning tasks.
Contribution
The paper proposes a novel approach that uses suboptimal data to learn a transition model and latent actions, enhancing imitation learning efficiency and effectiveness.
Findings
TRAIL improves imitation learning performance by up to 4x.
The method effectively utilizes suboptimal data for better downstream policies.
Theoretical analysis confirms the sample-efficiency benefits of the learned latent action space.
Abstract
The aim in imitation learning is to learn effective policies by utilizing near-optimal expert demonstrations. However, high-quality demonstrations from human experts can be expensive to obtain in large numbers. On the other hand, it is often much easier to obtain large quantities of suboptimal or task-agnostic trajectories, which are not useful for direct imitation, but can nevertheless provide insight into the dynamical structure of the environment, showing what could be done in the environment even if not what should be done. We ask the question, is it possible to utilize such suboptimal offline datasets to facilitate provably improved downstream imitation learning? In this work, we answer this question affirmatively and present training objectives that use offline datasets to learn a factored transition model whose structure enables the extraction of a latent action space. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
