Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks
Charles Jin, Melinda Sun, Martin Rinard

TL;DR
This paper introduces an incompatibility-based clustering method to detect and remove poisoned data in training datasets, significantly reducing backdoor attack success rates with minimal impact on model accuracy.
Contribution
The paper presents a novel clustering approach based on data incompatibility during training, specifically designed to defend against backdoor poisoning attacks in neural networks.
Findings
Successfully identifies poisoned data in datasets.
Reduces attack success rate to below 1% in most scenarios.
Maintains high clean accuracy with minimal drop.
Abstract
We propose a novel clustering mechanism based on an incompatibility property between subsets of data that emerges during model training. This mechanism partitions the dataset into subsets that generalize only to themselves, i.e., training on one subset does not improve performance on the other subsets. Leveraging the interaction between the dataset and the training process, our clustering mechanism partitions datasets into clusters that are defined by--and therefore meaningful to--the objective of the training process. We apply our clustering mechanism to defend against data poisoning attacks, in which the attacker injects malicious poisoned data into the training dataset to affect the trained model's output. Our evaluation focuses on backdoor attacks against deep neural networks trained to perform image classification using the GTSRB and CIFAR-10 datasets. Our results show that (1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
