Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Xuankun Rong; Wenke Huang; Jian Liang; Jinhe Bi; Xun Xiao; Yiming Li; Bo Du; Mang Ye

arXiv:2505.16916·cs.CR·May 23, 2025

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Xuankun Rong, Wenke Huang, Jian Liang, Jinhe Bi, Xun Xiao, Yiming Li, Bo Du, Mang Ye

PDF

Open Access 1 Repo

TL;DR

This paper introduces BYE, a novel self-supervised framework that detects and filters backdoor samples in multimodal large language models by analyzing attention entropy patterns, without requiring external supervision.

Contribution

It presents a new backdoor defense method that leverages attention entropy analysis to identify malicious samples during fine-tuning of MLLMs without needing clean data or model modifications.

Findings

01

BYE achieves near-zero attack success rates in experiments.

02

It maintains high performance on clean tasks.

03

The method is effective across various datasets, models, and trigger types.

Abstract

Multimodal Large Language Models (MLLMs) are increasingly deployed in fine-tuning-as-a-service (FTaaS) settings, where user-submitted datasets adapt general-purpose models to downstream tasks. This flexibility, however, introduces serious security risks, as malicious fine-tuning can implant backdoors into MLLMs with minimal effort. In this paper, we observe that backdoor triggers systematically disrupt cross-modal processing by causing abnormal attention concentration on non-semantic regions--a phenomenon we term attention collapse. Based on this insight, we propose Believe Your Eyes (BYE), a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples. BYE operates via a three-stage pipeline: (1) extracting attention maps using the fine-tuned model, (2) computing entropy scores and profiling sensitive layers via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuankunrong/bye
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Face recognition and analysis

MethodsSoftmax · Attention Is All You Need