An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

Guanxiong Liu; Abdallah Khreishah; Fatima Sharadgah; Issa Khalil

arXiv:2209.01721·cs.CR·September 7, 2022

An Adaptive Black-box Defense against Trojan Attacks (TrojDef)

Guanxiong Liu, Abdallah Khreishah, Fatima Sharadgah, Issa Khalil

PDF

Open Access

TL;DR

This paper introduces TrojDef, a practical black-box defense method against Trojan attacks on neural networks that monitors prediction confidence stability under noise to detect Trojan inputs without needing access to model internals.

Contribution

The work proposes TrojDef, a novel black-box detection approach based on prediction confidence bounds, which outperforms existing defenses and is robust across various settings.

Findings

01

TrojDef effectively detects Trojan inputs using confidence stability analysis.

02

It outperforms state-of-the-art defenses in accuracy and robustness.

03

TrojDef remains stable under different model architectures and training conditions.

Abstract

Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or is able to run back-propagation through it. In this work, we propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN. TrojDef tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed by random noise. We derive a function based on the prediction outputs which is called the prediction confidence bound to decide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning