EntropyScan: Towards Model-level Backdoor Detection in LVLMs via Visual Attention Entropy
Xuanyu Ge, Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen

TL;DR
EntropyScan is a novel, lightweight method that detects backdoored LVLMs by analyzing visual attention entropy deviations, without relying on training data or triggers.
Contribution
It introduces a trigger-agnostic, model-level backdoor detection technique based on visual attention entropy analysis in LVLMs.
Findings
Achieves an average F1 score of 98.5% in backdoor detection.
Attains an AUC of 96.6% across multiple architectures and attack scenarios.
Effectively identifies backdoored models by analyzing attention distribution anomalies.
Abstract
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across various tasks, yet they remain vulnerable to backdoor attacks. Existing defense methods predominantly focus on sample-level defense, which relies on the knowledge of training data or triggers. However, identifying whether a given model is backdoored remains a critical but unexplored task. To fill this gap, we propose EntropyScan, a lightweight and trigger-agnostic method for model-level backdoor detection in LVLMs. We first observe that backdoor injection disrupts the cross-modal alignment, resulting in pronounced structural anomalies in visual attention allocation on benign samples. Based on this insight, EntropyScan detects the backdoor models by quantifying such attention deviations. Specifically, it extracts visual attention distributions from the initial layers of the Large Language Model (LLM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
