MDTD: A Multi Domain Trojan Detector for Deep Neural Networks

Arezoo Rajabi; Surudhi Asokraj; Fengqing Jiang; Luyao Niu; Bhaskar; Ramasubramanian; Jim Ritcey; Radha Poovendran

arXiv:2308.15673·cs.CR·September 6, 2023·1 cites

MDTD: A Multi Domain Trojan Detector for Deep Neural Networks

Arezoo Rajabi, Surudhi Asokraj, Fengqing Jiang, Luyao Niu, Bhaskar, Ramasubramanian, Jim Ritcey, Radha Poovendran

PDF

Open Access 1 Repo

TL;DR

This paper introduces MDTD, a novel method for detecting Trojan triggers in deep neural networks across multiple data domains without prior knowledge of trigger strategies, using adversarial boundary distance estimation.

Contribution

MDTD is the first multi-domain Trojan detector that does not require knowledge of trigger embedding and effectively identifies Trojaned inputs across diverse datasets.

Findings

01

MDTD outperforms existing Trojan detection methods.

02

Effective against adaptive attacks that modify decision boundary distances.

03

Applicable to image, audio, and graph data types.

Abstract

Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pre-trained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rajabia/mdtd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning