Video Dataset Condensation with Diffusion Models

Zhe Li; Hadrien Reynaud; Mischa Dombrowski; Sarah Cechnicka; Franciskus Xaverius Erick; Bernhard Kainz

arXiv:2505.06670·cs.CV·December 10, 2025

Video Dataset Condensation with Diffusion Models

Zhe Li, Hadrien Reynaud, Mischa Dombrowski, Sarah Cechnicka, Franciskus Xaverius Erick, Bernhard Kainz

PDF

Open Access

TL;DR

This paper introduces a novel video dataset distillation method combining diffusion models, a specialized selection network, and a training-free clustering technique, achieving state-of-the-art performance with reduced computational costs.

Contribution

It presents a new approach for video dataset distillation using diffusion models, a tailored selection network, and a training-free clustering algorithm, significantly improving efficiency and effectiveness.

Findings

01

Achieves up to 10.61% performance improvement over state-of-the-art methods.

02

Demonstrates effectiveness across four benchmark datasets.

03

Reduces computational costs by generating videos once and using training-free selection.

Abstract

In recent years, the rapid expansion of dataset sizes and the increasing complexity of deep learning models have significantly escalated the demand for computational resources, both for data storage and model training. Dataset distillation has emerged as a promising solution to address this challenge by generating a compact synthetic dataset that retains the essential information from a large real dataset. However, existing methods often suffer from limited performance, particularly in the video domain. In this paper, we focus on video dataset distillation. We begin by employing a video diffusion model to generate synthetic videos. Since the videos are generated only once, this significantly reduces computational costs. Next, we introduce the Video Spatio-Temporal U-Net (VST-UNet), a model designed to select a diverse and informative subset of videos that effectively captures the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Concatenated Skip Connection · Diffusion · Focus · U-Net