Black-box Detection of Backdoor Attacks with Limited Information and   Data

Yinpeng Dong; Xiao Yang; Zhijie Deng; Tianyu Pang; Zihao Xiao; Hang; Su; Jun Zhu

arXiv:2103.13127·cs.CR·March 25, 2021

Black-box Detection of Backdoor Attacks with Limited Information and Data

Yinpeng Dong, Xiao Yang, Zhijie Deng, Tianyu Pang, Zihao Xiao, Hang, Su, Jun Zhu

PDF

TL;DR

This paper introduces a black-box method for detecting backdoor attacks in neural networks using limited model access, employing a gradient-free approach to identify triggers and improve model reliability.

Contribution

The proposed B3D method detects backdoors without needing poisoned data or white-box access, advancing practical defenses against backdoor threats.

Findings

01

Effective detection across multiple datasets

02

Works with only query access to models

03

Robust against various backdoor attack types

Abstract

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments. A malicious backdoor could be embedded in a model by poisoning the training dataset, whose intention is to make the infected model give wrong predictions during inference when the specific trigger appears. To mitigate the potential threats of backdoor attacks, various backdoor detection and defense methods have been proposed. However, the existing techniques usually require the poisoned training data or access to the white-box model, which is commonly unavailable in practice. In this paper, we propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. We introduce a gradient-free optimization algorithm to reverse-engineer the potential trigger for each class, which helps to reveal the existence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.