Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Kien Do; Haripriya Harikumar; Hung Le; Dung Nguyen; Truyen Tran; Santu; Rana; Dang Nguyen; Willy Susilo; Svetha Venkatesh

arXiv:2202.12154·cs.CR·February 15, 2023

Towards Effective and Robust Neural Trojan Defenses via Input Filtering

Kien Do, Haripriya Harikumar, Hung Le, Dung Nguyen, Truyen Tran, Santu, Rana, Dang Nguyen, Willy Susilo, Svetha Venkatesh

PDF

Open Access

TL;DR

This paper introduces novel input filtering defenses for neural networks, leveraging data compression and adversarial learning to effectively detect and mitigate sophisticated Trojan attacks without prior assumptions.

Contribution

The paper proposes two new filtering defenses, VIF and AIF, and a combined mechanism FtC, which improve robustness against advanced Trojan attacks without assumptions on triggers or target classes.

Findings

01

Significantly outperform baseline defenses in experiments

02

Robust against multiple advanced Trojan attack types

03

Effective with limited training data and large triggers

Abstract

Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Cardiac Arrest and Resuscitation · Explainable Artificial Intelligence (XAI)

MethodsVariational Inference