Targeted Bit-Flip Attacks on LLM-Based Agents

Jialai Wang; Ya Wen; Zhongmou Liu; Yuxiao Wu; Bingyi He; Zongpeng Li; Ee-Chien Chang

arXiv:2603.10042·cs.CR·March 12, 2026

Targeted Bit-Flip Attacks on LLM-Based Agents

Jialai Wang, Ya Wen, Zhongmou Liu, Yuxiao Wu, Bingyi He, Zongpeng Li, Ee-Chien Chang

PDF

Open Access

TL;DR

This paper introduces Flip-Agent, a novel targeted bit-flip attack framework that exploits hardware faults to manipulate large language model-based agents, revealing significant security vulnerabilities in multi-stage AI systems.

Contribution

The work presents the first targeted BFA framework for LLM-based agents, demonstrating its effectiveness and exposing new security risks in complex AI pipelines.

Findings

01

Flip-Agent outperforms existing BFAs on real-world tasks

02

Significant vulnerability identified in LLM-based agent systems

03

Manipulates both outputs and tool invocations

Abstract

Targeted bit-flip attacks (BFAs) exploit hardware faults to manipulate model parameters, posing a significant security threat. While prior work targets single-step inference models (e.g., image classifiers), LLM-based agents with multi-stage pipelines and external tools present new attack surfaces, which remain unexplored. This work introduces Flip-Agent, the first targeted BFA framework for LLM-based agents, manipulating both final outputs and tool invocations. Our experiments show that Flip-Agent significantly outperforms existing targeted BFAs on real-world agent tasks, revealing a critical vulnerability in LLM-based agent systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Physical Unclonable Functions (PUFs) and Hardware Security