FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

Khurram Khalil; Khaza Anuarul Hoque

arXiv:2512.09872·cs.CR·December 11, 2025

FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

Khurram Khalil, Khaza Anuarul Hoque

PDF

Open Access

TL;DR

FlipLLM introduces a reinforcement learning-based framework that efficiently identifies critical bit-flip vulnerabilities in multimodal large models, enabling rapid assessment and hardware-level defense strategies.

Contribution

We propose FlipLLM, a scalable RL framework that generalizes across models to efficiently discover minimal, impactful bit-flip vulnerabilities in large multimodal models.

Findings

01

FlipLLM finds critical bits up to 2.5x faster than state-of-the-art methods.

02

Flipping identified bits drastically reduces model accuracy, e.g., LLaMA 3.1 drops from 69.9% to 0.2%.

03

Hardware protections like ECC SECDED can mitigate identified vulnerabilities.

Abstract

Generative Artificial Intelligence models, such as Large Language Models (LLMs) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based threats, specifically bit-flip attacks (BFAs). Existing BFA discovery methods lack generalizability and struggle to scale, often failing to analyze the vast parameter space and complex interdependencies of modern foundation models in a reasonable time. This paper proposes FlipLLM, a reinforcement learning (RL) architecture-agnostic framework that formulates BFA discovery as a sequential decision-making problem. FlipLLM combines sensitivity-guided layer pruning with Q-learning to efficiently identify minimal, high-impact bit sets that can induce catastrophic failure. We demonstrate the effectiveness and generalizability of FlipLLM by applying it to a diverse set of models, including prominent text-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Wireless Signal Modulation Classification