Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao, Zhe Li, Yige Li, Jun Sun

TL;DR
Q-MLLM introduces a vector quantization-based architecture that enhances the security of multimodal large language models by effectively defending against adversarial visual attacks while maintaining reasoning capabilities.
Contribution
It proposes a novel two-level vector quantization method to discretize visual representations, blocking attack pathways and improving robustness against adversarial threats.
Findings
Achieves 100% defense success rate against jailbreak attacks.
Maintains competitive utility performance with minimal inference overhead.
Establishes vector quantization as an effective defense for multimodal AI security.
Abstract
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in cross-modal understanding, but remain vulnerable to adversarial attacks through visual inputs despite robust textual safety mechanisms. These vulnerabilities arise from two core weaknesses: the continuous nature of visual representations, which allows for gradient-based attacks, and the inadequate transfer of text-based safety mechanisms to visual content. We introduce Q-MLLM, a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. By discretizing visual representations at both pixel-patch and semantic levels, Q-MLLM blocks attack pathways and bridges the cross-modal safety alignment gap. Our two-stage training methodology ensures robust learning while maintaining model utility.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
