GVD: Guiding Video Diffusion Model for Scalable Video Distillation
Kunyang Li, Jeffrey A Chan Santiago, Sarinda Dhanesh Samarasinghe, Gaowen Liu, Mubarak Shah

TL;DR
GVD is a novel diffusion-based video dataset distillation method that efficiently captures essential spatial and temporal features, enabling high-quality video generation and classification with significantly fewer frames.
Contribution
This paper introduces GVD, the first diffusion-based approach for scalable video dataset distillation that jointly distills spatial and temporal information.
Findings
Achieves 78.29% of original performance with only 1.98% of frames on MiniUCF.
Reaches 73.83% of original performance with 3.30% of frames on HMDB51.
Outperforms previous state-of-the-art methods on multiple datasets.
Abstract
To address the larger computation and storage requirements associated with large video datasets, video dataset distillation aims to capture spatial and temporal information in a significantly smaller dataset, such that training on the distilled data has comparable performance to training on all of the data. We propose GVD: Guiding Video Diffusion, the first diffusion-based video distillation method. GVD jointly distills spatial and temporal features, ensuring high-fidelity video generation across diverse actions while capturing essential motion information. Our method's diverse yet representative distillations significantly outperform previous state-of-the-art approaches on the MiniUCF and HMDB51 datasets across 5, 10, and 20 Instances Per Class (IPC). Specifically, our method achieves 78.29 percent of the original dataset's performance using only 1.98 percent of the total number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
