BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS

Ye Li; Chengcheng Zhu; Yanchao Zhao; and Jiale Zhang

arXiv:2508.03307·cs.CR·August 6, 2025

BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS

Ye Li, Chengcheng Zhu, Yanchao Zhao, and Jiale Zhang

PDF

TL;DR

This paper introduces BDFirewall, a black-box defense framework against backdoor attacks in MLaaS, classifying triggers by visibility and applying tailored purification methods to effectively neutralize them without model access.

Contribution

The paper proposes a novel progressive defense framework that categorizes backdoor triggers by impact and removes them efficiently in black-box scenarios, outperforming existing methods.

Findings

01

Reduces attack success rate by 33.25% on average

02

Improves poisoned sample accuracy by 29.64%

03

Achieves up to 111x faster inference

Abstract

In this paper, we endeavor to address the challenges of backdoor attacks countermeasures in black-box scenarios, thereby fortifying the security of inference under MLaaS. We first categorize backdoor triggers from a new perspective, i.e., their impact on the patched area, and divide them into: high-visibility triggers (HVT), semi-visibility triggers (SVT), and low-visibility triggers (LVT). Based on this classification, we propose a progressive defense framework, BDFirewall, that removes these triggers from the most conspicuous to the most subtle, without requiring model access. First, for HVTs, which create the most significant local semantic distortions, we identify and eliminate them by detecting these salient differences. We then restore the patched area to mitigate the adverse impact of such removal process. The localized purification designed for HVTs is, however, ineffective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.