TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

Mingzu Liu; Hao Fang; Runmin Cong

arXiv:2601.21692·cs.AI·January 30, 2026

TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

Mingzu Liu, Hao Fang, Runmin Cong

PDF

Open Access

TL;DR

This paper introduces TCAP, an unsupervised method that detects backdoor attacks in multimodal large language models by analyzing attention distribution divergences across three key components, improving robustness and generalization.

Contribution

The paper presents a novel unsupervised backdoor detection framework, TCAP, that leverages attention profiling across three components to identify poisoned samples in MLLMs.

Findings

01

TCAP effectively detects backdoor samples across various architectures and attacks.

02

Attention distribution divergence is a universal fingerprint of poisoned data.

03

TCAP outperforms existing defenses in robustness and generalization.

Abstract

Fine-Tuning-as-a-Service (FTaaS) facilitates the customization of Multimodal Large Language Models (MLLMs) but introduces critical backdoor risks via poisoned data. Existing defenses either rely on supervised signals or fail to generalize across diverse trigger types and modalities. In this work, we uncover a universal backdoor fingerprint-attention allocation divergence-where poisoned samples disrupt the balanced attention distribution across three functional components: system instructions, vision inputs, and user textual queries, regardless of trigger morphology. Motivated by this insight, we propose Tri-Component Attention Profiling (TCAP), an unsupervised defense framework to filter backdoor samples. TCAP decomposes cross-modal attention maps into the three components, identifies trigger-responsive attention heads via Gaussian Mixture Model (GMM) statistical profiling, and isolates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks