Analyzing Effects of Fake Training Data on the Performance of Deep   Learning Systems

Pratinav Seth; Akshat Bhandari; Kumud Lakara

arXiv:2303.01268·cs.CV·March 3, 2023·1 cites

Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems

Pratinav Seth, Akshat Bhandari, Kumud Lakara

PDF

Open Access

TL;DR

This paper investigates how incorporating synthetic data generated by GANs affects the robustness and accuracy of computer vision deep learning models, providing insights into optimal data mixing strategies.

Contribution

It offers a detailed analysis of the impact of synthetic data proportions on model performance and robustness in computer vision tasks.

Findings

01

Synthetic data can improve model robustness to distribution shifts.

02

Optimal synthetic-to-real data ratios enhance prediction quality.

03

Synthetic data helps mitigate class imbalance issues.

Abstract

Deep learning models frequently suffer from various problems such as class imbalance and lack of robustness to distribution shift. It is often difficult to find data suitable for training beyond the available benchmarks. This is especially the case for computer vision models. However, with the advent of Generative Adversarial Networks (GANs), it is now possible to generate high-quality synthetic data. This synthetic data can be used to alleviate some of the challenges faced by deep learning models. In this work we present a detailed analysis of the effect of training computer vision models using different proportions of synthetic data along with real (organic) data. We analyze the effect that various quantities of synthetic data, when mixed with original data, can have on a model's robustness to out-of-distribution data and the general quality of predictions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning