Distill Video Datasets into Images
Zhenghao Zhao, Haoxuan Wang, Kai Wang, Yuzhang Shang, Yuan Hong, Yan Yan

TL;DR
This paper introduces SFVD, a novel video dataset distillation method that distills videos into key frames, significantly improving performance and optimization efficiency over previous approaches.
Contribution
The paper proposes Single-Frame Video set Distillation (SFVD), a new framework that simplifies video distillation by focusing on informative frames, addressing optimization challenges caused by temporal complexity.
Findings
SFVD outperforms prior methods by up to 5.3% on MiniUCF.
Using single frames captures essential video semantics effectively.
Incorporating temporal info with sampled videos enhances distillation quality.
Abstract
Dataset distillation aims to synthesize compact yet informative datasets that allow models trained on them to achieve performance comparable to training on the full dataset. While this approach has shown promising results for image data, extending dataset distillation methods to video data has proven challenging and often leads to suboptimal performance. In this work, we first identify the core challenge in video set distillation as the substantial increase in learnable parameters introduced by the temporal dimension of video, which complicates optimization and hinders convergence. To address this issue, we observe that a single frame is often sufficient to capture the discriminative semantics of a video. Leveraging this insight, we propose Single-Frame Video set Distillation (SFVD), a framework that distills videos into highly informative frames for each class. Using differentiable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
