SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

Eric Xue; Ruiyi Zhang; Pengtao Xie

arXiv:2511.14301·cs.CR·January 6, 2026

SteganoBackdoor: Stealthy and Data-Efficient Backdoor Attacks on Language Models

Eric Xue, Ruiyi Zhang, Pengtao Xie

PDF

Open Access

TL;DR

This paper introduces SteganoBackdoor, a novel method for creating stealthy backdoor attacks on language models using steganography to embed triggers without obvious artifacts, effective even with limited poisoned data.

Contribution

The paper presents a new steganography-based framework for backdoor attacks that are highly covert and effective across various models and defenses.

Findings

01

High attack success rate with limited poisoned data

02

Effective against data filtering defenses

03

Steganographic triggers are indistinguishable from normal text

Abstract

Modern language models remain vulnerable to backdoor attacks via poisoned data, where training inputs containing a trigger are paired with a target output, causing the model to reproduce that behavior whenever the trigger appears at inference time. Recent work has emphasized stealthy attacks that stress-test data-curation defenses using stylized artifacts or token-level perturbations as triggers, but this focus leaves a more practically relevant threat model underexplored: backdoors tied to naturally occurring semantic concepts. We introduce SteganoBackdoor, an optimization-based framework that constructs SteganoPoisons, steganographic poisoned training examples in which a backdoor payload is distributed across a fluent sentence while exhibiting no representational overlap with the inference-time semantic trigger. Across diverse model architectures, SteganoBackdoor achieves high attack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques