ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Kun Wang; Cheng Qian; Miao Yu; Lilan Peng; Liang Lin; Jiaming Zhang; Tianyu Zhang; Yu Cheng; Yang Wang

arXiv:2604.19083·cs.CR·April 22, 2026

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Kun Wang, Cheng Qian, Miao Yu, Lilan Peng, Liang Lin, Jiaming Zhang, Tianyu Zhang, Yu Cheng, Yang Wang

PDF

1 Repo

TL;DR

ProjLens is an interpretability framework that reveals how backdoor attacks operate in multimodal models, identifying low-rank structures and activation mechanisms that enable vulnerabilities.

Contribution

The paper introduces ProjLens, a novel method to understand backdoor mechanisms in multimodal models, highlighting differences from text-only models and uncovering key activation patterns.

Findings

01

Backdoor injection updates are full-rank and lack dedicated trigger neurons.

02

Backdoor-critical parameters are encoded within a low-rank subspace of the projector.

03

Backdoor activation involves a linear scaling of semantic shifts with input norm.

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable success in cross-modal understanding and generation, yet their deployment is threatened by critical safety vulnerabilities. While prior works have demonstrated the feasibility of backdoors in MLLMs via fine-tuning data poisoning to manipulate inference, the underlying mechanisms of backdoor attacks remain opaque, complicating the understanding and mitigation. To bridge this gap, we propose ProjLens, an interpretability framework designed to demystify MLLMs backdoors. We first establish that normal downstream task alignment--even when restricted to projector fine--tuning--introduces vulnerability to backdoor injection, whose activation mechanism is different from that observed in text-only LLMs. Through extensive experiments across four backdoor variants, we uncover:(1) Low-Rank Structure: Backdoor injection updates appear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/ProjLens-8FD7
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.