TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning
Yupei Liu, Yanting Wang, Jinyuan Jia

TL;DR
TrojanDec is a novel data-free method that detects and recovers Trojaned inputs in self-supervised learning encoders, enhancing security without requiring training data.
Contribution
It introduces TrojanDec, the first data-free approach for detecting and removing Trojan triggers from test inputs in self-supervised learning models.
Findings
Effectively identifies Trojaned inputs under state-of-the-art attacks.
Successfully recovers Trojaned inputs to maintain model utility.
Outperforms existing Trojan detection methods.
Abstract
An image encoder pre-trained by self-supervised learning can be used as a general-purpose feature extractor to build downstream classifiers for various downstream tasks. However, many studies showed that an attacker can embed a trojan into an encoder such that multiple downstream classifiers built based on the trojaned encoder simultaneously inherit the trojan behavior. In this work, we propose TrojanDec, the first data-free method to identify and recover a test input embedded with a trigger. Given a (trojaned or clean) encoder and a test input, TrojanDec first predicts whether the test input is trojaned. If not, the test input is processed in a normal way to maintain the utility. Otherwise, the test input will be further restored to remove the trigger. Our extensive evaluation shows that TrojanDec can effectively identify the trojan (if any) from a given test input and recover it under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Integrated Circuits and Semiconductor Failure Analysis · Cell Image Analysis Techniques
