Can we spot a fake?
Shahar Mendelson, Grigoris Paouris, Roman Vershynin

TL;DR
This paper investigates the maximum size of undetectable fake data introduced by an adversary, relating it to the geometric properties of the set of possible adversarial tricks, and extends the analysis beyond Gaussian data.
Contribution
The authors establish bounds on the detectability radius for fake data based on the Gaussian width of the adversary's trick set, generalizing to non-Gaussian distributions and arbitrary sets.
Findings
For symmetric trick sets, the detectability radius is about twice the scaled Gaussian width.
Upper bounds on detectability hold for any set T and distribution of real data.
Conjecture that focusing on the most important directions of T can improve bounds for asymmetric sets.
Abstract
The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point from the standard normal distribution in . An adversary observes and corrupts it by adding a vector , where they can choose any vector from a fixed set of the adversary's ``tricks'', and where is a fixed radius. The adversary's choice of may depend on the true data . The adversary wants to hide the corruption by making the fake data statistically indistinguishable from the real data . What is the largest radius for which the adversary can create an undetectable fake? We show that for highly symmetric sets , the detectability radius is approximately twice the scaled Gaussian width of . The upper bound actually holds for arbitrary sets and generalizes to arbitrary, non-Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
