From Image to Video: An Empirical Study of Diffusion Representations
Pedro V\'elez, Luisa F. Polan\'ia, Yi Yang, Chuhan Zhang, Rishabh, Kabra, Anurag Arnab, Mehdi S. M. Sajjadi

TL;DR
This study compares video and image diffusion models, revealing that video models generally provide superior representations for various visual understanding tasks, highlighting the importance of temporal information.
Contribution
It provides the first direct comparison of video and image diffusion models for visual understanding, analyzing how temporal data influences representation quality.
Findings
Video diffusion models outperform image models in downstream tasks
Features vary significantly across layers and noise levels
Model size and training budget impact representation quality
Abstract
Diffusion models have revolutionized generative modeling, enabling unprecedented realism in image and video synthesis. This success has sparked interest in leveraging their representations for visual understanding tasks. While recent works have explored this potential for image generation, the visual understanding capabilities of video diffusion models remain largely uncharted. To address this gap, we systematically compare the same model architecture trained for video versus image generation, analyzing the performance of their latent representations on various downstream tasks including image classification, action recognition, depth estimation, and tracking. Results show that video diffusion models consistently outperform their image counterparts, though we find a striking range in the extent of this superiority. We further analyze features extracted from different layers and with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCultural Industries and Urban Development · Art History and Market Analysis
MethodsDiffusion
