BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
Xiaobei Yan, Yiming Li, Hao Wang, Han Qiu, Tianwei Zhang

TL;DR
This paper introduces BitHydra, a novel method to increase inference costs of large language models by strategically flipping model bits, enabling endless output generation with minimal modifications.
Contribution
We propose BitHydra, a framework that formulates and solves a binary optimization problem to maximize LLM inference costs through targeted weight bit flips, using ADMM for efficient solution.
Findings
Achieves endless generation with 1-4 bit flips on models from 1.5B to 16B parameters.
Effectively suppresses end-of-sequence probability to prolong output.
Demonstrates robustness against standard models and defenses.
Abstract
Large language models (LLMs) are widely deployed, but their substantial compute demands make them vulnerable to inference cost attacks that aim to deliberately maximize the output length. In this work, we investigate a distinct attack surface: maximizing inference cost by tampering with the model parameters instead of inputs. This approach leverages the established capability of Bit-Flip Attacks (BFAs) to persistently alter model behavior via minute weight perturbations, effectively decoupling the attack from specific input queries. To realize this, we propose BitHydra, a framework that addresses the unique optimization challenge of identifying the exact weight bits that maximize generation cost. We formulate the attack as a constrained Binary Integer Programming (BIP) problem designed to systematically suppress the end-of-sequence (i.e., <eos>) probability. To overcome the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The method was seemingly effective in eliciting long generations from models. - The authors introduce and defend their method for finding susceptible weights with reasonable ablations. - Introduce a new class of threats to running LLM inference in a shared environment.
- **Threat model**. The attacker is assumed has access to the weights of the defender’s LLM and is also an unprivileged user on the same machine as the defender. I find the latter assumption reasonable; as noted, for example, the attacker may be using an MLaaS platform with the defender. However, the latter is a more significant assumption and therefore should be explicitly stated in the abstract/intro. Similarly, in the threat model discussion, the paper incorrectly notes that it “adopts the sa
- Hardware attacks are an emerging threat to larger LLM deployments that have not been heavily investigated so far, making this work timely. - The fact that, in many cases, only a few bitflips are sufficient to suppress the < EOS > token is independently interesting and raises further questions about the brittleness of LLM decoding. - Given their setting, the evaluation is mostly comprehensive, and the transferability across prompts is an interesting property. The lack of correlation between mod
- In my opinion the main problem of the presented work is its threat model and the corresponding idealized assumptions. While this paper somewhat positions itself as "towards bitflip attacks," there are many assumptions underlying this attack that make it problematic in reality. In rough order: - We assume that a large enough API-provider deployment is machine co-tenant with the adversary. This clearly does not hold for any of the very large providers, but one might be able to define some sc
1. The paper explores an unconventional perspective by shifting inference-cost attacks from input-space manipulation to parameter-space corruption, which is conceptually new for LLMs. 2. It combines ideas from Rowhammer-style bit-flip attacks and inference efficiency degradation, two previously separate domains. 3. Experiments demonstrate the attack’s transferability across unseen prompts and apparent robustness against simple defenses (fine-tuning and weight reconstruction).
1. The assumed adversary model is impractical: an unprivileged tenant performing targeted Rowhammer flips on specific bits within a cloud-deployed LLM’s DRAM without triggering any integrity check is highly speculative. The attack requires white-box access to model weights (architecture and parameters) to compute gradients—this assumption contradicts real-world MLaaS scenarios where attackers do not have such access. In the paper, there is no evidence or simulation showing that Rowhammer can tar
1. **Extensive analysis**: Their main results show results on 11 LLMs. Their analysis ranges from a quantitative discussion on it to a qualitative analysis of the attack (e.g., Additional Attack surface in lines 429-431) 2. **Presentation**: The paper is very well organized and easy to follow. I especially liked Section 4, where they propose their method, keeping a good balance between intuitive description and rigorous explanation.
1. **Technical comparison with prisonbreak** (Section 5.1, Table 1): In their experiment, they conduct a comparison with their original method inspired by Prisonbreak, another bitflip attack aiming at jailbreak. I like this comparison. However, the current draft is absent from a detailed description of this. I ask the authors to elaborate on the detailed description of the baseline approach, and how BitHydra differs from it (and why it is necessary, considering their objective). 2. **Hyperparam
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsFLIP
