Solving Trojan Detection Competitions with Linear Weight Classification

Todd Huster; Peter Lin; Razvan Stefanescu; Emmanuel Ekwedike; Ritu; Chadha

arXiv:2411.03445·cs.LG·November 7, 2024

Solving Trojan Detection Competitions with Linear Weight Classification

Todd Huster, Peter Lin, Razvan Stefanescu, Emmanuel Ekwedike, Ritu, Chadha

PDF

Open Access

TL;DR

This paper presents a linear weight classification method for Trojan detection in neural networks, achieving high accuracy across multiple benchmarks by training a binary classifier on processed model weights.

Contribution

The paper introduces a novel linear weight-based classifier for Trojan detection that does not require access to triggered data, improving detection across various datasets.

Findings

01

High detection accuracy across multiple benchmarks

02

Effectiveness depends on dataset and domain characteristics

03

Pre-processing steps significantly improve classifier performance

Abstract

Neural networks can conceal malicious Trojan backdoors that allow a trigger to covertly change the model behavior. Detecting signs of these backdoors, particularly without access to any triggered data, is the subject of ongoing research and open challenges. In one common formulation of the problem, we are given a set of clean and poisoned models and need to predict whether a given test model is clean or poisoned. In this paper, we introduce a detector that works remarkably well across many of the existing datasets and domains. It is obtained by training a binary classifier on a large number of models' weights after performing a few different pre-processing steps including feature selection and standardization, reference model weights subtraction, and model alignment prior to detection. We evaluate this algorithm on a diverse set of Trojan detection benchmarks and domains and examine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Adversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training · Feature Selection