Distill Video Datasets into Images

Zhenghao Zhao; Haoxuan Wang; Kai Wang; Yuzhang Shang; Yuan Hong; Yan Yan

arXiv:2512.14621·cs.CV·December 17, 2025

Distill Video Datasets into Images

Zhenghao Zhao, Haoxuan Wang, Kai Wang, Yuzhang Shang, Yuan Hong, Yan Yan

PDF

Open Access

TL;DR

This paper introduces SFVD, a novel video dataset distillation method that distills videos into key frames, significantly improving performance and optimization efficiency over previous approaches.

Contribution

The paper proposes Single-Frame Video set Distillation (SFVD), a new framework that simplifies video distillation by focusing on informative frames, addressing optimization challenges caused by temporal complexity.

Findings

01

SFVD outperforms prior methods by up to 5.3% on MiniUCF.

02

Using single frames captures essential video semantics effectively.

03

Incorporating temporal info with sampled videos enhances distillation quality.

Abstract

Dataset distillation aims to synthesize compact yet informative datasets that allow models trained on them to achieve performance comparable to training on the full dataset. While this approach has shown promising results for image data, extending dataset distillation methods to video data has proven challenging and often leads to suboptimal performance. In this work, we first identify the core challenge in video set distillation as the substantial increase in learnable parameters introduced by the temporal dimension of video, which complicates optimization and hinders convergence. To address this issue, we observe that a single frame is often sufficient to capture the discriminative semantics of a video. Leveraging this insight, we propose Single-Frame Video set Distillation (SFVD), a framework that distills videos into highly informative frames for each class. Using differentiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning