Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems
Pratinav Seth, Akshat Bhandari, Kumud Lakara

TL;DR
This paper investigates how incorporating synthetic data generated by GANs affects the robustness and accuracy of computer vision deep learning models, providing insights into optimal data mixing strategies.
Contribution
It offers a detailed analysis of the impact of synthetic data proportions on model performance and robustness in computer vision tasks.
Findings
Synthetic data can improve model robustness to distribution shifts.
Optimal synthetic-to-real data ratios enhance prediction quality.
Synthetic data helps mitigate class imbalance issues.
Abstract
Deep learning models frequently suffer from various problems such as class imbalance and lack of robustness to distribution shift. It is often difficult to find data suitable for training beyond the available benchmarks. This is especially the case for computer vision models. However, with the advent of Generative Adversarial Networks (GANs), it is now possible to generate high-quality synthetic data. This synthetic data can be used to alleviate some of the challenges faced by deep learning models. In this work we present a detailed analysis of the effect of training computer vision models using different proportions of synthetic data along with real (organic) data. We analyze the effect that various quantities of synthetic data, when mixed with original data, can have on a model's robustness to out-of-distribution data and the general quality of predictions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning
