Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Yiwen Liu, Jessica Bader, Jae Myung Kim

TL;DR
This study investigates whether enforcing the realism of synthetic images, termed feasibility, significantly impacts the performance of CLIP-based classifiers trained on such data, finding minimal effects in most cases.
Contribution
The paper introduces VariReal, a pipeline for minimally editing images to control feasibility, and provides empirical evidence that feasibility has limited impact on classifier performance.
Findings
Feasibility minimally affects CLIP performance, with less than 0.3% accuracy difference.
The impact of feasibility depends on specific attributes and their adversarial influence.
Mixing feasible and infeasible images does not significantly change results.
Abstract
With the development of photorealistic diffusion models, models trained in part or fully on synthetic data achieve progressively better results. However, diffusion models still routinely generate images that would not exist in reality, such as a dog floating above the ground or with unrealistic texture artifacts. We define the concept of feasibility as whether attributes in a synthetic image could realistically exist in the real-world domain; synthetic images containing attributes that violate this criterion are considered infeasible. Intuitively, infeasible images are typically considered out-of-distribution; thus, training on such images is expected to hinder a model's ability to generalize to real-world data, and they should therefore be excluded from the training set whenever possible. However, does feasibility really matter? In this paper, we investigate whether enforcing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Resource Development and Performance Evaluation · Intelligent Tutoring Systems and Adaptive Learning
MethodsDiffusion · Sparse Evolutionary Training · Contrastive Language-Image Pre-training
