Trojan Signatures in DNN Weights

Greg Fields; Mohammad Samragh; Mojan Javaheripi; Farinaz Koushanfar,; Tara Javidi

arXiv:2109.02836·cs.LG·September 8, 2021

Trojan Signatures in DNN Weights

Greg Fields, Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar,, Tara Javidi

PDF

TL;DR

This paper introduces a lightweight, data-free method for detecting trojan attacks in deep neural networks by analyzing the weight distribution of the final linear layer, effectively distinguishing trojaned models from benign ones.

Contribution

The authors propose the first ultra light-weight, data-free trojan detection technique focusing on final layer weights, effective across various architectures and attack types.

Findings

01

Detects trojans without training/test data or heavy computation

02

Distinguishes trojaned models by analyzing weight distributions

03

Effective against multiple attack methods and architectures

Abstract

Deep neural networks have been shown to be vulnerable to backdoor, or trojan, attacks where an adversary has embedded a trigger in the network at training time such that the model correctly classifies all standard inputs, but generates a targeted, incorrect classification on any input which contains the trigger. In this paper, we present the first ultra light-weight and highly effective trojan detection method that does not require access to the training/test data, does not involve any expensive computations, and makes no assumptions on the nature of the trojan trigger. Our approach focuses on analysis of the weights of the final, linear layer of the network. We empirically demonstrate several characteristics of these weights that occur frequently in trojaned networks, but not in benign networks. In particular, we show that the distribution of the weights associated with the trojan…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer