Stem-OB: Generalizable Visual Imitation Learning with Stem-Like   Convergent Observation through Diffusion Inversion

Kaizhe Hu; Zihang Rui; Yao He; Yuyao Liu; Pu Hua; Huazhe Xu

arXiv:2411.04919·cs.RO·November 14, 2024

Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion

Kaizhe Hu, Zihang Rui, Yao He, Yuyao Liu, Pu Hua, Huazhe Xu

PDF

Open Access 1 Repo

TL;DR

Stem-OB leverages pretrained diffusion models to enhance visual imitation learning by removing low-level visual variations, significantly improving generalization and success rates in both simulated and real-world tasks.

Contribution

This paper introduces Stem-OB, a novel approach using diffusion inversion for robust visual imitation learning without additional training.

Findings

01

22.2% average success rate improvement in real-world tasks

02

Effective suppression of visual variations like lighting and textures

03

Plug-and-play method compatible with existing systems

Abstract

Visual imitation learning methods demonstrate strong performance, yet they lack generalization when faced with visual input perturbations, including variations in lighting and textures, impeding their real-world application. We propose Stem-OB that utilizes pretrained image diffusion models to suppress low-level visual differences while maintaining high-level scene structures. This image inversion process is akin to transforming the observation into a shared representation, from which other observations stem, with extraneous details removed. Stem-OB contrasts with data-augmentation approaches as it is robust to various unspecified appearance changes without the need for additional training. Our method is a simple yet highly effective plug-and-play solution. Empirical results confirm the effectiveness of our approach in simulated tasks and show an exceptionally significant improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hukz18/Stem-Ob-Code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsDiffusion