FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning
Khurram Khalil, Khaza Anuarul Hoque

TL;DR
FlipLLM introduces a reinforcement learning-based framework that efficiently identifies critical bit-flip vulnerabilities in multimodal large models, enabling rapid assessment and hardware-level defense strategies.
Contribution
We propose FlipLLM, a scalable RL framework that generalizes across models to efficiently discover minimal, impactful bit-flip vulnerabilities in large multimodal models.
Findings
FlipLLM finds critical bits up to 2.5x faster than state-of-the-art methods.
Flipping identified bits drastically reduces model accuracy, e.g., LLaMA 3.1 drops from 69.9% to 0.2%.
Hardware protections like ECC SECDED can mitigate identified vulnerabilities.
Abstract
Generative Artificial Intelligence models, such as Large Language Models (LLMs) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based threats, specifically bit-flip attacks (BFAs). Existing BFA discovery methods lack generalizability and struggle to scale, often failing to analyze the vast parameter space and complex interdependencies of modern foundation models in a reasonable time. This paper proposes FlipLLM, a reinforcement learning (RL) architecture-agnostic framework that formulates BFA discovery as a sequential decision-making problem. FlipLLM combines sensitivity-guided layer pruning with Q-learning to efficiently identify minimal, high-impact bit sets that can induce catastrophic failure. We demonstrate the effectiveness and generalizability of FlipLLM by applying it to a diverse set of models, including prominent text-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Wireless Signal Modulation Classification
