VILP: Imitation Learning with Latent Video Planning
Zhengtong Xu, Qiang Qiu, Yu She

TL;DR
VILP introduces a latent video diffusion model for efficient, time-consistent video generation to enhance robot policy learning, reducing data needs and supporting multi-modal actions.
Contribution
The paper presents a novel latent video diffusion approach for predictive robot videos, improving efficiency, temporal consistency, and multi-modal action representation in imitation learning.
Findings
VILP outperforms existing methods in training costs and inference speed.
Generated videos exhibit high temporal consistency across multiple views.
VILP maintains robust policy performance with less high-quality task-specific data.
Abstract
In the era of generative AI, integrating video generation models into robotics opens new possibilities for the general-purpose robot agent. This paper introduces imitation learning with latent video planning (VILP). We propose a latent video diffusion model to generate predictive robot videos that adhere to temporal consistency to a good degree. Our method is able to generate highly time-aligned videos from multiple views, which is crucial for robot policy learning. Our video generation model is highly time-efficient. For example, it can generate videos from two distinct perspectives, each consisting of six frames with a resolution of 96x160 pixels, at a rate of 5 Hz. In the experiments, we demonstrate that VILP outperforms the existing video generation robot policy across several metrics: training costs, inference speed, temporal consistency of generated videos, and the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Human Motion and Animation
MethodsDiffusion
