# Bypassing Backdoor Detection Algorithms in Deep Learning

**Authors:** Te Juin Lester Tan, Reza Shokri

arXiv: 1905.13409 · 2020-06-09

## TL;DR

This paper introduces an adversarial backdoor embedding method that can evade existing detection algorithms by making poisoned data indistinguishable from clean data in the model's hidden representations.

## Contribution

It proposes a novel adaptive adversarial training algorithm that optimizes model loss while hiding backdoor features from detection methods.

## Key findings

- The method successfully bypasses state-of-the-art backdoor detection algorithms.
- Poisoned models maintain high accuracy on clean data while evading detection.
- Highlights the need for adversary-aware defense mechanisms in backdoor detection.

## Abstract

Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features, referred to as the backdoor trigger (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model parameters, through the statistical difference between the latent representations of adversarial and clean input samples in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques. We design an adaptive adversarial training algorithm that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. This work calls for designing adversary-aware defense mechanisms for backdoor detection.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13409/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13409/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1905.13409/full.md

---
Source: https://tomesphere.com/paper/1905.13409