Towards Effective and Robust Neural Trojan Defenses via Input Filtering
Kien Do, Haripriya Harikumar, Hung Le, Dung Nguyen, Truyen Tran, Santu, Rana, Dang Nguyen, Willy Susilo, Svetha Venkatesh

TL;DR
This paper introduces novel input filtering defenses for neural networks, leveraging data compression and adversarial learning to effectively detect and mitigate sophisticated Trojan attacks without prior assumptions.
Contribution
The paper proposes two new filtering defenses, VIF and AIF, and a combined mechanism FtC, which improve robustness against advanced Trojan attacks without assumptions on triggers or target classes.
Findings
Significantly outperform baseline defenses in experiments
Robust against multiple advanced Trojan attack types
Effective with limited training data and large triggers
Abstract
Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Cardiac Arrest and Resuscitation · Explainable Artificial Intelligence (XAI)
MethodsVariational Inference
