# The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

**Authors:** Kevin Roth, Yannic Kilcher, Thomas Hofmann

arXiv: 1902.04818 · 2019-05-10

## TL;DR

This paper introduces a statistical test based on log-odds to reliably detect and correct adversarial examples in machine learning models, especially under white-box attack scenarios.

## Contribution

The authors propose a simple, effective statistical test that exploits anomalies caused by adversarial perturbations, providing guarantees and empirical validation for detection and correction.

## Key findings

- Detectability of adversarial examples is theoretically guaranteed under certain conditions.
- The proposed test can be computed easily and calibrated using random input corruption.
- High accuracy in correcting adversarial predictions at test time.

## Abstract

We investigate conditions under which test statistics exist that can reliably detect examples, which have been adversarially manipulated in a white-box attack. These statistics can be easily computed and calibrated by randomly corrupting inputs. They exploit certain anomalies that adversarial attacks introduce, in particular if they follow the paradigm of choosing perturbations optimally under p-norm constraints. Access to the log-odds is the only requirement to defend models. We justify our approach empirically, but also provide conditions under which detectability via the suggested test statistics is guaranteed to be effective. In our experiments, we show that it is even possible to correct test time predictions for adversarial attacks with high accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.04818/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1902.04818/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1902.04818/full.md

---
Source: https://tomesphere.com/paper/1902.04818