IMLE Policy: Fast and Sample Efficient Visuomotor Policy Learning via Implicit Maximum Likelihood Estimation
Krishan Rana, Robert Lee, David Pershouse, Niko Suenderhauf

TL;DR
IMLE Policy introduces a data-efficient, fast, and simple imitation learning method for visuomotor tasks that outperforms existing approaches in low-data and real-time scenarios.
Contribution
The paper presents IMLE Policy, a novel behaviour cloning approach that achieves high performance with less data and faster inference by using implicit maximum likelihood estimation.
Findings
Requires 38% less data than baselines.
Improves inference speed by 97.3% over diffusion policies.
Effectively learns complex multi-modal behaviors in robotics.
Abstract
Recent advances in imitation learning, particularly using generative modelling techniques like diffusion, have enabled policies to capture complex multi-modal action distributions. However, these methods often require large datasets and multiple inference steps for action generation, posing challenges in robotics where the cost for data collection is high and computation resources are limited. To address this, we introduce IMLE Policy, a novel behaviour cloning approach based on Implicit Maximum Likelihood Estimation (IMLE). IMLE Policy excels in low-data regimes, effectively learning from minimal demonstrations and requiring 38\% less data on average to match the performance of baseline methods in learning complex multi-modal behaviours. Its simple generator-based architecture enables single-step action generation, improving inference speed by 97.3\% compared to Diffusion Policy, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeonatal and fetal brain pathology · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
