Scalable Backdoor Detection in Neural Networks
Haripriya Harikumar, Vuong Le, Santu Rana, Sourangshu Bhattacharya,, Sunil Gupta, and Svetha Venkatesh

TL;DR
This paper introduces a scalable, efficient backdoor detection method for neural networks that effectively distinguishes Trojaned models from clean ones, outperforming existing approaches in accuracy and computational cost.
Contribution
A novel trigger reverse-engineering approach that is scalable and universal, improving detection accuracy and efficiency over prior methods.
Findings
Achieves perfect separation between Trojaned and clean models.
Computational complexity does not increase with the number of labels.
Outperforms current state-of-the-art detection methods.
Abstract
Recently, it has been shown that deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
