TRIGS: Trojan Identification from Gradient-based Signatures
Mohamed E. Hussein, Sudharshan Subramaniam Janakiraman, Wael, AbdAlmageed

TL;DR
TRIGS is a novel method that detects Trojan-infected models by analyzing gradient-based signatures, achieving state-of-the-art results on multiple datasets, including a new challenging ImageNet transformer dataset.
Contribution
The paper introduces TRIGS, a new approach for Trojan detection using activation optimization signatures, and demonstrates its effectiveness across various models and datasets, including vision transformers.
Findings
TRIGS outperforms baseline methods on public datasets.
It requires only a small number of clean samples for effective detection.
TRIGS performs well even without prior knowledge of attack architecture.
Abstract
Training machine learning models can be very expensive or even unaffordable. This may be, for example, due to data limitations, such as unavailability or being too large, or computational power limitations. Therefore, it is a common practice to rely on open-source pre-trained models whenever possible.However, this practice is alarming from a security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model's behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present a novel method for detecting Trojan models. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. We call our method TRIGS for TRojan Identification from Gradient-based Signatures. TRIGS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Explainable Artificial Intelligence (XAI)
MethodsAttention Is All You Need · Residual Connection · Layer Normalization · Dense Connections · Softmax · Linear Layer · Multi-Head Attention · Vision Transformer
