MDTD: A Multi Domain Trojan Detector for Deep Neural Networks
Arezoo Rajabi, Surudhi Asokraj, Fengqing Jiang, Luyao Niu, Bhaskar, Ramasubramanian, Jim Ritcey, Radha Poovendran

TL;DR
This paper introduces MDTD, a novel method for detecting Trojan triggers in deep neural networks across multiple data domains without prior knowledge of trigger strategies, using adversarial boundary distance estimation.
Contribution
MDTD is the first multi-domain Trojan detector that does not require knowledge of trigger embedding and effectively identifies Trojaned inputs across diverse datasets.
Findings
MDTD outperforms existing Trojan detection methods.
Effective against adaptive attacks that modify decision boundary distances.
Applicable to image, audio, and graph data types.
Abstract
Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pre-trained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
