Generative Spatiotemporal Data Augmentation

Jinfan Zhou; Lixin Luo; Sungmin Eum; Heesung Kwon; Jeong Joon Park

arXiv:2512.12508·cs.CV·December 16, 2025

Generative Spatiotemporal Data Augmentation

Jinfan Zhou, Lixin Luo, Sungmin Eum, Heesung Kwon, Jeong Joon Park

PDF

Open Access

TL;DR

This paper introduces a novel spatiotemporal data augmentation method using video diffusion models to generate diverse, realistic video variations from images, improving model performance in low-data scenarios like UAV imagery.

Contribution

It presents a new approach leveraging off-the-shelf video diffusion models for spatiotemporal augmentation, with practical guidelines for implementation and addressing disocclusion issues.

Findings

01

Enhanced model performance in low-data settings

02

Broadened data distribution beyond traditional methods

03

Effective augmentation for UAV and COCO datasets

Abstract

We explore spatiotemporal data augmentation using video foundation models to diversify both camera viewpoints and scene dynamics. Unlike existing approaches based on simple geometric transforms or appearance perturbations, our method leverages off-the-shelf video diffusion models to generate realistic 3D spatial and temporal variations from a given image dataset. Incorporating these synthesized video clips as supplemental training data yields consistent performance gains in low-data settings, such as UAV-captured imagery where annotations are scarce. Beyond empirical improvements, we provide practical guidelines for (i) choosing an appropriate spatiotemporal generative setup, (ii) transferring annotations to synthetic frames, and (iii) addressing disocclusion - regions newly revealed and unlabeled in generated views. Experiments on COCO subsets and UAV-captured datasets show that, when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Face recognition and analysis