TL;DR
This paper introduces a vision-language grounded framework for synthetic data augmentation in remote sensing, proposing ARAS400k, a large dataset for evaluating synthetic data's impact on segmentation and captioning tasks.
Contribution
It presents a novel interpretable synthetic data augmentation framework combining generative models with vision and language models, and introduces ARAS400k dataset for remote sensing.
Findings
Models trained on synthetic data perform competitively on downstream tasks.
Augmented training data improves performance over real-only datasets.
ARAS400k enables automated evaluation of synthetic data quality.
Abstract
Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typically calculate latent feature similarity, which is difficult to interpret and does not always correlate with the contribution to downstream tasks. We propose a vision-language grounded framework for interpretable synthetic data augmentation and evaluation in remote sensing. Our approach combines generative models, semantic segmentation and image captioning with vision and language models. Based on this framework, we introduce ARAS400k: A large-scale Remote sensing dataset Augmented with Synthetic data for segmentation and captioning, containing 100k real images and 300k synthetic images, each paired with segmentation maps and descriptions. ARAS400k enables the automated evaluation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
