# Single Image Backdoor Inversion via Robust Smoothed Classifiers

**Authors:** Mingjie Sun, J. Zico Kolter

arXiv: 2303.00215 · 2023-12-19

## TL;DR

This paper introduces SmoothInv, a novel method for backdoor inversion that can recover hidden backdoors from a single image using robust smoothed classifiers, outperforming existing techniques in accuracy and efficiency.

## Contribution

SmoothInv is the first approach capable of recovering backdoors from a single image, leveraging adversarial robustness and minimal optimization, without complex regularization.

## Key findings

- Achieves nearly 100% attack success rate on ImageNet classifiers.
- Recovers high-fidelity backdoors with minimal images.
- Remains robust against adaptive attackers.

## Abstract

Backdoor inversion, a central step in many backdoor defenses, is a reverse-engineering process to recover the hidden backdoor trigger inserted into a machine learning model. Existing approaches tackle this problem by searching for a backdoor pattern that is able to flip a set of clean images into the target class, while the exact size needed of this support set is rarely investigated. In this work, we present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image. Insipired by recent advances in adversarial robustness, our method SmoothInv starts from a single clean image, and then performs projected gradient descent towards the target class on a robust smoothed version of the original backdoored classifier. We find that backdoor patterns emerge naturally from such optimization process. Compared to existing backdoor inversion methods, SmoothInv introduces minimum optimization variables and does not require complex regularization schemes. We perform a comprehensive quantitative and qualitative study on backdoored classifiers obtained from existing backdoor attacks. We demonstrate that SmoothInv consistently recovers successful backdoors from single images: for backdoored ImageNet classifiers, our reconstructed backdoors have close to 100% attack success rates. We also show that they maintain high fidelity to the underlying true backdoors. Last, we propose and analyze two countermeasures to our approach and show that SmoothInv remains robust in the face of an adaptive attacker. Our code is available at https://github.com/locuslab/smoothinv.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2303.00215/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/2303.00215/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/2303.00215/full.md

---
Source: https://tomesphere.com/paper/2303.00215