Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Yifei Wang; Tianlin Li; Xiaohan Zhang; Yida Yang; Xiaoyu Zhang; Li Pan

arXiv:2605.20641·cs.CR·May 21, 2026

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang, Xiaoyu Zhang, Li Pan

PDF

TL;DR

This paper uncovers how compilation optimization of large language models can be maliciously exploited to implant stealthy backdoors that bypass standard safety checks, posing new security risks.

Contribution

It introduces a unified attack framework exploiting numerical side effects of compilation to trigger backdoors in LLMs without modifying hardware or compilers.

Findings

01

Backdoors achieve 90% success rate across multiple LLMs and tasks.

02

Clean accuracy remains nearly 100% despite backdoors.

03

The attack bypasses standard safety evaluations.

Abstract

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be maliciously exploited to implant stealthy backdoors in LLMs. We propose a unified optimization-triggered attack framework comprising two complementary strategies. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation. We empirically demonstrate that these optimization-triggered backdoors achieve attack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.