From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense

Binyan Xu; Fan Yang; Xilin Dai; Di Tang; Kehuan Zhang

arXiv:2601.19448·cs.LG·January 28, 2026

From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense

Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

PDF

Open Access

TL;DR

This paper introduces PRISM, a novel external semantic auditing framework using vision-language models to defend against backdoor attacks in neural networks, achieving state-of-the-art results across multiple datasets.

Contribution

It proposes a paradigm shift from internal diagnosis to external semantic auditing with VLMs, including a hybrid teacher and adaptive router for robust backdoor defense.

Findings

01

PRISM reduces attack success rate to below 1% on CIFAR-10.

02

It improves clean accuracy while defending against 11 attack types.

03

Demonstrates effectiveness across 17 datasets.

Abstract

Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement & Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection