Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei; Ming Fan; Guoheng Sun; Xicheng Zhang; Haijun Wang; Ting Liu

arXiv:2604.21700·cs.CR·April 24, 2026

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang, Haijun Wang, Ting Liu

PDF

TL;DR

This paper introduces BadStyle, a novel backdoor attack framework for LLMs that uses natural style triggers, achieving high success rates while maintaining stealthiness and robustness against defenses.

Contribution

BadStyle leverages an LLM to generate natural poisoned samples with imperceptible style triggers, improving attack stability and effectiveness in realistic threat models.

Findings

01

High attack success rates across seven LLMs

02

Auxiliary target loss improves backdoor activation stability

03

Backdoor remains effective in downstream deployment scenarios

Abstract

The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor attacks against LLMs. However, existing methods suffer from three key shortcomings: explicit trigger patterns that compromise naturalness, unreliable injection of attacker-specified payloads in long-form generation, and incompletely specified threat models that obscure how backdoors are delivered and activated in practice. To address these gaps, we present BadStyle, a complete backdoor attack framework and pipeline. BadStyle leverages an LLM as a poisoned sample generator to construct natural and stealthy poisoned samples that carry imperceptible style-level triggers while preserving semantics and fluency. To stabilize payload injection during fine-tuning, we design an auxiliary target loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.