Latent Policies for Adversarial Imitation Learning
Tianyu Wang, Nikhil Karnwal, Nikolay Atanasov

TL;DR
This paper introduces a stable adversarial imitation learning method that uses a latent action space to improve training stability and performance in high-dimensional robot tasks.
Contribution
It proposes LAPAL, a novel approach that employs a latent action space via an encoder-decoder model to enhance GAIL training stability and effectiveness.
Findings
LAPAL training is stable with near-monotonic performance improvements.
LAPAL achieves expert-level performance in complex locomotion and manipulation tasks.
Compared to GAIL, LAPAL converges faster and performs better in high-dimensional environments.
Abstract
This paper considers learning robot locomotion and manipulation tasks from expert demonstrations. Generative adversarial imitation learning (GAIL) trains a discriminator that distinguishes expert from agent transitions, and in turn use a reward defined by the discriminator output to optimize a policy generator for the agent. This generative adversarial training approach is very powerful but depends on a delicate balance between the discriminator and the generator training. In high-dimensional problems, the discriminator training may easily overfit or exploit associations with task-irrelevant features for transition classification. A key insight of this work is that performing imitation learning in a suitable latent task space makes the training process stable, even in challenging high-dimensional problems. We use an action encoder-decoder model to obtain a low-dimensional latent action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
MethodsGenerative Adversarial Imitation Learning
