Overcoming Knowledge Barriers: Online Imitation Learning from Visual Observation with Pretrained World Models
Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian, Karl

TL;DR
This paper introduces AIME-NoB, a novel algorithm that enhances imitation learning from visual observations by overcoming embodiment and demonstration knowledge barriers through online interaction and regularization, improving efficiency and robustness.
Contribution
It proposes specific solutions to address key knowledge barriers in ILfO with pretrained models and integrates them into the AIME algorithm, significantly improving performance.
Findings
AIME-NoB outperforms baseline methods in sample efficiency.
It achieves higher convergence performance on vision-based control tasks.
The approach demonstrates robustness across different benchmarks.
Abstract
Pretraining and finetuning models has become increasingly popular in decision-making. But there are still serious impediments in Imitation Learning from Observation (ILfO) with pretrained models. This study identifies two primary obstacles: the Embodiment Knowledge Barrier (EKB) and the Demonstration Knowledge Barrier (DKB). The EKB emerges due to the pretrained models' limitations in handling novel observations, which leads to inaccurate action inference. Conversely, the DKB stems from the reliance on limited demonstration datasets, restricting the model's adaptability across diverse scenarios. We propose separate solutions to overcome each barrier and apply them to Action Inference by Maximising Evidence (AIME), a state-of-the-art algorithm. This new algorithm, AIME-NoB, integrates online interactions and a data-driven regulariser to mitigate the EKB. Additionally, it uses a surrogate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
