Dataset Creation for Visual Entailment using Generative AI
Rob Reijtenbach, Suzan Verberne, Gijs Wijnholds

TL;DR
This paper introduces a synthetic dataset for visual entailment created using generative AI, demonstrating its effectiveness in training models with minimal performance loss compared to real data.
Contribution
The paper presents a novel method for generating a large synthetic visual entailment dataset from textual data using Stable Diffusion, addressing data scarcity issues.
Findings
Synthetic data yields comparable model performance to real data.
Generated images are valid for training visual entailment classifiers.
Synthetic dataset slightly reduces F-score but remains effective.
Abstract
In this paper we present and validate a new synthetic dataset for training visual entailment models. Existing datasets for visual entailment are small and sparse compared to datasets for textual entailment. Manually creating datasets is labor-intensive. We base our synthetic dataset on the SNLI dataset for textual entailment. We take the premise text from SNLI as input prompts in a generative image model, Stable Diffusion, creating an image to replace each textual premise. We evaluate our dataset both intrinsically and extrinsically. For extrinsic evaluation, we evaluate the validity of the generated images by using them as training data for a visual entailment classifier based on CLIP feature vectors. We find that synthetic training data only leads to a slight drop in quality on SNLI-VE, with an F-score 0.686 compared to 0.703 when trained on real data. We also compare the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
