# Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

**Authors:** Tuan Anh Le, Atilim Gunes Baydin, Robert Zinkov, Frank Wood

arXiv: 1703.00868 · 2017-03-03

## TL;DR

This paper establishes a formal link between synthetic data training for neural networks and Bayesian model-based reasoning, demonstrating state-of-the-art results in CAPTCHA-breaking and discussing implications for robustness and generalization.

## Contribution

It introduces a novel perspective connecting synthetic data training to Bayesian inference, with a practical CAPTCHA-breaking architecture trained solely on synthetic data.

## Key findings

- Achieved state-of-the-art CAPTCHA-breaking performance
- Successfully attacked real-world CAPTCHAs from Facebook and Wikipedia
- Provided insights on robustness and generalization when using synthetic data

## Abstract

We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.00868/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1703.00868/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1703.00868/full.md

---
Source: https://tomesphere.com/paper/1703.00868