TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised   Learning

Yupei Liu; Yanting Wang; Jinyuan Jia

arXiv:2501.04108·cs.CR·February 5, 2025

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

Yupei Liu, Yanting Wang, Jinyuan Jia

PDF

Open Access 1 Video

TL;DR

TrojanDec is a novel data-free method that detects and recovers Trojaned inputs in self-supervised learning encoders, enhancing security without requiring training data.

Contribution

It introduces TrojanDec, the first data-free approach for detecting and removing Trojan triggers from test inputs in self-supervised learning models.

Findings

01

Effectively identifies Trojaned inputs under state-of-the-art attacks.

02

Successfully recovers Trojaned inputs to maintain model utility.

03

Outperforms existing Trojan detection methods.

Abstract

An image encoder pre-trained by self-supervised learning can be used as a general-purpose feature extractor to build downstream classifiers for various downstream tasks. However, many studies showed that an attacker can embed a trojan into an encoder such that multiple downstream classifiers built based on the trojaned encoder simultaneously inherit the trojan behavior. In this work, we propose TrojanDec, the first data-free method to identify and recover a test input embedded with a trigger. Given a (trojaned or clean) encoder and a test input, TrojanDec first predicts whether the test input is trojaned. If not, the test input is processed in a normal way to maintain the utility. Otherwise, the test input will be further restored to remove the trigger. Our extensive evaluation shows that TrojanDec can effectively identify the trojan (if any) from a given test input and recover it under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning· underline

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Integrated Circuits and Semiconductor Failure Analysis · Cell Image Analysis Techniques