BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS
Ye Li, Chengcheng Zhu, Yanchao Zhao, and Jiale Zhang

TL;DR
This paper introduces BDFirewall, a black-box defense framework against backdoor attacks in MLaaS, classifying triggers by visibility and applying tailored purification methods to effectively neutralize them without model access.
Contribution
The paper proposes a novel progressive defense framework that categorizes backdoor triggers by impact and removes them efficiently in black-box scenarios, outperforming existing methods.
Findings
Reduces attack success rate by 33.25% on average
Improves poisoned sample accuracy by 29.64%
Achieves up to 111x faster inference
Abstract
In this paper, we endeavor to address the challenges of backdoor attacks countermeasures in black-box scenarios, thereby fortifying the security of inference under MLaaS. We first categorize backdoor triggers from a new perspective, i.e., their impact on the patched area, and divide them into: high-visibility triggers (HVT), semi-visibility triggers (SVT), and low-visibility triggers (LVT). Based on this classification, we propose a progressive defense framework, BDFirewall, that removes these triggers from the most conspicuous to the most subtle, without requiring model access. First, for HVTs, which create the most significant local semantic distortions, we identify and eliminate them by detecting these salient differences. We then restore the patched area to mitigate the adverse impact of such removal process. The localized purification designed for HVTs is, however, ineffective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
