Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification

Onur Alp Kirci; M. Emre Gursoy

arXiv:2508.15934·cs.CR·August 25, 2025

Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification

Onur Alp Kirci, M. Emre Gursoy

PDF

TL;DR

This paper introduces three sample selection strategies to enhance clean-label backdoor attacks in text classification, significantly increasing attack success rates without harming model accuracy.

Contribution

The paper proposes novel sample selection methods that improve the effectiveness of clean-label backdoor attacks in NLP models, outperforming existing techniques.

Findings

01

Strategies significantly improve attack success rate

02

Minimal impact on model's clean accuracy

03

Outperforms state-of-the-art clean-label attack methods

Abstract

Backdoor attacks pose a significant threat to the integrity of text classification models used in natural language processing. While several dirty-label attacks that achieve high attack success rates (ASR) have been proposed, clean-label attacks are inherently more difficult. In this paper, we propose three sample selection strategies to improve attack effectiveness in clean-label scenarios: Minimum, Above50, and Below50. Our strategies identify those samples which the model predicts incorrectly or with low confidence, and by injecting backdoor triggers into such samples, we aim to induce a stronger association between the trigger patterns and the attacker-desired target label. We apply our methods to clean-label variants of four canonical backdoor attacks (InsertSent, WordInj, StyleBkd, SynBkd) and evaluate them on three datasets (IMDB, SST2, HateSpeech) and four model types (LSTM,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.