TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning
Mingzu Liu, Hao Fang, Runmin Cong

TL;DR
This paper introduces TCAP, an unsupervised method that detects backdoor attacks in multimodal large language models by analyzing attention distribution divergences across three key components, improving robustness and generalization.
Contribution
The paper presents a novel unsupervised backdoor detection framework, TCAP, that leverages attention profiling across three components to identify poisoned samples in MLLMs.
Findings
TCAP effectively detects backdoor samples across various architectures and attacks.
Attention distribution divergence is a universal fingerprint of poisoned data.
TCAP outperforms existing defenses in robustness and generalization.
Abstract
Fine-Tuning-as-a-Service (FTaaS) facilitates the customization of Multimodal Large Language Models (MLLMs) but introduces critical backdoor risks via poisoned data. Existing defenses either rely on supervised signals or fail to generalize across diverse trigger types and modalities. In this work, we uncover a universal backdoor fingerprint-attention allocation divergence-where poisoned samples disrupt the balanced attention distribution across three functional components: system instructions, vision inputs, and user textual queries, regardless of trigger morphology. Motivated by this insight, we propose Tri-Component Attention Profiling (TCAP), an unsupervised defense framework to filter backdoor samples. TCAP decomposes cross-modal attention maps into the three components, identifies trigger-responsive attention heads via Gaussian Mixture Model (GMM) statistical profiling, and isolates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks
