Generative Spatiotemporal Data Augmentation
Jinfan Zhou, Lixin Luo, Sungmin Eum, Heesung Kwon, Jeong Joon Park

TL;DR
This paper introduces a novel spatiotemporal data augmentation method using video diffusion models to generate diverse, realistic video variations from images, improving model performance in low-data scenarios like UAV imagery.
Contribution
It presents a new approach leveraging off-the-shelf video diffusion models for spatiotemporal augmentation, with practical guidelines for implementation and addressing disocclusion issues.
Findings
Enhanced model performance in low-data settings
Broadened data distribution beyond traditional methods
Effective augmentation for UAV and COCO datasets
Abstract
We explore spatiotemporal data augmentation using video foundation models to diversify both camera viewpoints and scene dynamics. Unlike existing approaches based on simple geometric transforms or appearance perturbations, our method leverages off-the-shelf video diffusion models to generate realistic 3D spatial and temporal variations from a given image dataset. Incorporating these synthesized video clips as supplemental training data yields consistent performance gains in low-data settings, such as UAV-captured imagery where annotations are scarce. Beyond empirical improvements, we provide practical guidelines for (i) choosing an appropriate spatiotemporal generative setup, (ii) transferring annotations to synthetic frames, and (iii) addressing disocclusion - regions newly revealed and unlabeled in generated views. Experiments on COCO subsets and UAV-captured datasets show that, when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Face recognition and analysis
