Solving Trojan Detection Competitions with Linear Weight Classification
Todd Huster, Peter Lin, Razvan Stefanescu, Emmanuel Ekwedike, Ritu, Chadha

TL;DR
This paper presents a linear weight classification method for Trojan detection in neural networks, achieving high accuracy across multiple benchmarks by training a binary classifier on processed model weights.
Contribution
The paper introduces a novel linear weight-based classifier for Trojan detection that does not require access to triggered data, improving detection across various datasets.
Findings
High detection accuracy across multiple benchmarks
Effectiveness depends on dataset and domain characteristics
Pre-processing steps significantly improve classifier performance
Abstract
Neural networks can conceal malicious Trojan backdoors that allow a trigger to covertly change the model behavior. Detecting signs of these backdoors, particularly without access to any triggered data, is the subject of ongoing research and open challenges. In one common formulation of the problem, we are given a set of clean and poisoned models and need to predict whether a given test model is clean or poisoned. In this paper, we introduce a detector that works remarkably well across many of the existing datasets and domains. It is obtained by training a binary classifier on a large number of models' weights after performing a few different pre-processing steps including feature selection and standardization, reference model weights subtraction, and model alignment prior to detection. We evaluate this algorithm on a diverse set of Trojan detection benchmarks and domains and examine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Adversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training · Feature Selection
