TFL: Targeted Bit-Flip Attack on Large Language Model

Jingkai Guo; Chaitali Chakrabarti; Deliang Fan

arXiv:2602.17837·cs.CR·February 23, 2026

TFL: Targeted Bit-Flip Attack on Large Language Model

Jingkai Guo, Chaitali Chakrabarti, Deliang Fan

PDF

Open Access

TL;DR

TFL introduces a targeted bit-flip attack method on large language models that manipulates specific outputs with minimal collateral damage, using fewer than 50 bit flips, raising security concerns.

Contribution

The paper presents TFL, a novel framework for precise, targeted bit-flip attacks on LLMs, enabling control over specific outputs while minimizing impact on unrelated inputs.

Findings

01

Achieves targeted output manipulation with less than 50 bit flips.

02

Significantly reduces unintended effects on benign queries.

03

Effective across multiple LLM architectures and benchmarks.

Abstract

Large language models (LLMs) are increasingly deployed in safety and security critical applications, raising concerns about their robustness to model parameter fault injection attacks. Recent studies have shown that bit-flip attacks (BFAs), which exploit computer main memory (i.e., DRAM) vulnerabilities to flip a small number of bits in model weights, can severely disrupt LLM behavior. However, existing BFA on LLM largely induce un-targeted failure or general performance degradation, offering limited control over manipulating specific or targeted outputs. In this paper, we present TFL, a novel targeted bit-flip attack framework that enables precise manipulation of LLM outputs for selected prompts while maintaining almost no or minor degradation on unrelated inputs. Within our TFL framework, we propose a novel keyword-focused attack loss to promote attacker-specified target tokens in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques