An information theoretic limit to data amplification
S. J. Watts, L. Crow

TL;DR
This paper establishes an information theoretic limit on data amplification using generative models like GANs, showing that increasing data sample size does not inherently improve variable resolution but can enhance statistical significance.
Contribution
It introduces a mathematical bound on data amplification, demonstrating that information content remains unchanged despite increased sample size, and clarifies conditions for this bound.
Findings
A gain greater than one is possible without increasing information content.
The resolution of variables is not improved by data amplification.
Sample size increase can improve statistical significance without extra information.
Abstract
In recent years generative artificial intelligence has been used to create data to support science analysis. For example, Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem. This has the advantage that a GAN creates data in a significantly reduced computing time. N training events for a GAN can result in GN generated events with the gain factor, G, being more than one. This appears to violate the principle that one cannot get information for free. This is not the only way to amplify data so this process will be referred to as data amplification which is studied using information theoretic concepts. It is shown that a gain of greater than one is possible whilst keeping the information content of the data unchanged. This leads to a mathematical bound which only depends on the number of generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Complex Systems and Time Series Analysis
