Detecting Semantic Backdoors in a Mystery Shopping Scenario

Arpad Berta; Gabor Danner; Istvan Hegedus; Mark Jelasity

arXiv:2601.03805·cs.LG·January 8, 2026

Detecting Semantic Backdoors in a Mystery Shopping Scenario

Arpad Berta, Gabor Danner, Istvan Hegedus, Mark Jelasity

PDF

Open Access

TL;DR

This paper presents a method for detecting semantic backdoors in classification models by training trusted models, creating a reference pool, and using model distance metrics, especially with adversarial training, to distinguish clean from poisoned models.

Contribution

The authors introduce a novel approach for detecting semantic backdoors using trusted model training, model distance calibration, and adversarial training, outperforming existing detectors.

Findings

01

Most reliable detection uses adversarial training.

02

Model distances based on inverted input samples are effective.

03

Method often completely separates clean and poisoned models.

Abstract

Detecting semantic backdoors in classification models--where some classes can be activated by certain natural, but out-of-distribution inputs--is an important problem that has received relatively little attention. Semantic backdoors are significantly harder to detect than backdoors that are based on trigger patterns due to the lack of such clearly identifiable patterns. We tackle this problem under the assumption that the clean training dataset and the training recipe of the model are both known. These assumptions are motivated by a consumer protection scenario, in which the responsible authority performs mystery shopping to test a machine learning service provider. In this scenario, the authority uses the provider's resources and tools to train a model on a given dataset and tests whether the provider included a backdoor. In our proposed approach, the authority creates a reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)