Defense Against Multi-target Trojan Attacks

Haripriya Harikumar; Santu Rana; Kien Do; Sunil Gupta; Wei Zong; Willy; Susilo; Svetha Venkastesh

arXiv:2207.03895·cs.CV·July 11, 2022

Defense Against Multi-target Trojan Attacks

Haripriya Harikumar, Santu Rana, Kien Do, Sunil Gupta, Wei Zong, Willy, Susilo, Svetha Venkastesh

PDF

Open Access

TL;DR

This paper presents a novel defense mechanism against multi-target Trojan attacks in deep learning, using trigger reverse-engineering and transferability measures to detect malicious backdoors effectively.

Contribution

It introduces a new threat model with multi-target backdoors and a detection method based on trigger transferability, outperforming existing defenses.

Findings

01

High detection accuracy across multiple datasets

02

Effective reverse-engineering of diverse triggers

03

Superior performance compared to state-of-the-art methods

Abstract

Adversarial attacks on deep learning-based models pose a significant threat to the current AI infrastructure. Among them, Trojan attacks are the hardest to defend against. In this paper, we first introduce a variation of the Badnet kind of attacks that introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image. The former makes it more potent and the latter makes it extremely easy to carry out the attack in the physical space. The state-of-the-art Trojan detection methods fail with this threat model. To defend against this attack, we first introduce a trigger reverse-engineering mechanism that uses multiple images to recover a variety of potential triggers. We then propose a detection mechanism by measuring the transferability of such recovered triggers. A Trojan trigger will have very high transferability i.e. they make other images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Digital Media Forensic Detection