From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

TL;DR
This paper introduces PRISM, a novel external semantic auditing framework using vision-language models to defend against backdoor attacks in neural networks, achieving state-of-the-art results across multiple datasets.
Contribution
It proposes a paradigm shift from internal diagnosis to external semantic auditing with VLMs, including a hybrid teacher and adaptive router for robust backdoor defense.
Findings
PRISM reduces attack success rate to below 1% on CIFAR-10.
It improves clean accuracy while defending against 11 attack types.
Demonstrates effectiveness across 17 datasets.
Abstract
Deep Neural Networks remain inherently vulnerable to backdoor attacks. Traditional test-time defenses largely operate under the paradigm of internal diagnosis methods like model repairing or input robustness, yet these approaches are often fragile under advanced attacks as they remain entangled with the victim model's corrupted parameters. We propose a paradigm shift from Internal Diagnosis to External Semantic Auditing, arguing that effective defense requires decoupling safety from the victim model via an independent, semantically grounded auditor. To this end, we present a framework harnessing Universal Vision-Language Models (VLMs) as evolving semantic gatekeepers. We introduce PRISM (Prototype Refinement & Inspection via Statistical Monitoring), which overcomes the domain gap of general VLMs through two key mechanisms: a Hybrid VLM Teacher that dynamically refines visual prototypes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
