Invisible Hands: Gray-Box Bit Flip Attack for Steering LLMs Without Knowledge of Gradients, Data, and Weights

Abeer Matar A. Almalky; Ziyan Wang; Mohaiminul Al Nahian; Li Yang; Adnan Siraj Rakin

arXiv:2511.22700·cs.CR·April 28, 2026

Invisible Hands: Gray-Box Bit Flip Attack for Steering LLMs Without Knowledge of Gradients, Data, and Weights

Abeer Matar A. Almalky, Ziyan Wang, Mohaiminul Al Nahian, Li Yang, Adnan Siraj Rakin

PDF

TL;DR

This paper introduces Invisible Hands, a gray-box bit flip attack method that efficiently compromises large language models without needing access to gradients, data, or weights.

Contribution

It presents the first gray-box BFA framework for LLMs that estimates vulnerability without data or gradient access, reducing memory overhead and increasing practicality.

Findings

01

Effective attack achieved with minimal weight perturbations

02

Scales efficiently across six open-source LLMs

03

Reduces memory overhead compared to white-box methods

Abstract

In recent years, large language models (LLMs) have achieved remarkable advances and are increasingly deployed in critical applications across diverse domains. This growing adoption raises urgent concerns about their security and robustness. In this work, we investigate the impact of Bit Flip Attacks (BFAs) on LLMs, which exploit hardware faults to corrupt model parameters, thereby threatening model integrity and performance. Existing BFA studies primarily assume a white-box setting with access to exact model weights and part of the dataset, and rely on progressive gradient-based bit-search strategies to identify vulnerable bits in model weights. However, gradient computation for LLMs is computationally expensive and memory intensive. In addition, assuming access to exact victim model weights and datasets is challenging due to increasingly strict user privacy regulations. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.