Dirty and Clean-Label attack detection using GAN discriminators

John W. Smutny

arXiv:2506.01224·cs.CV·June 5, 2025

Dirty and Clean-Label attack detection using GAN discriminators

John W. Smutny

PDF

Open Access

TL;DR

This paper proposes using GAN discriminators to detect mislabeled and manipulated images in training datasets, providing an efficient alternative to re-training models for poison detection.

Contribution

It introduces a novel method employing GAN discriminators to identify dirty and clean-label attacks without extensive re-training.

Findings

01

GAN discriminator confidence scores can detect 100% of tested poison attacks at epsilon 0.20

02

The method effectively distinguishes mislabeled images from correctly labeled ones

03

Threshold calibration using in-class samples enhances detection accuracy

Abstract

Gathering enough images to train a deep computer vision model is a constant challenge. Unfortunately, collecting images from unknown sources can leave your model s behavior at risk of being manipulated by a dirty-label or clean-label attack unless the images are properly inspected. Manually inspecting each image-label pair is impractical and common poison-detection methods that involve re-training your model can be time consuming. This research uses GAN discriminators to protect a single class against mislabeled and different levels of modified images. The effect of said perturbation on a basic convolutional neural network classifier is also included for reference. The results suggest that after training on a single class, GAN discriminator s confidence scores can provide a threshold to identify mislabeled images and identify 100% of the tested poison starting at a perturbation epsilon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques