Grounding Synthetic Data Generation With Vision and Language Models

\"Umit Mert \c{C}a\u{g}lar; Alptekin Temizel

arXiv:2603.09625·cs.CV·May 5, 2026

Grounding Synthetic Data Generation With Vision and Language Models

\"Umit Mert \c{C}a\u{g}lar, Alptekin Temizel

PDF

1 Repo

TL;DR

This paper introduces a vision-language grounded framework for synthetic data augmentation in remote sensing, proposing ARAS400k, a large dataset for evaluating synthetic data's impact on segmentation and captioning tasks.

Contribution

It presents a novel interpretable synthetic data augmentation framework combining generative models with vision and language models, and introduces ARAS400k dataset for remote sensing.

Findings

01

Models trained on synthetic data perform competitively on downstream tasks.

02

Augmented training data improves performance over real-only datasets.

03

ARAS400k enables automated evaluation of synthetic data quality.

Abstract

Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typically calculate latent feature similarity, which is difficult to interpret and does not always correlate with the contribution to downstream tasks. We propose a vision-language grounded framework for interpretable synthetic data augmentation and evaluation in remote sensing. Our approach combines generative models, semantic segmentation and image captioning with vision and language models. Based on this framework, we introduce ARAS400k: A large-scale Remote sensing dataset Augmented with Synthetic data for segmentation and captioning, containing 100k real images and 300k synthetic images, each paired with segmentation maps and descriptions. ARAS400k enables the automated evaluation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caglarmert/ARAS400k
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.