Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering

Rui Zhu; Di Tang; Siyuan Tang; Guanhong Tao; Shiqing Ma; Xiaofeng; Wang; Haixu Tang

arXiv:2301.12318·cs.CR·July 23, 2024·1 cites

Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering

Rui Zhu, Di Tang, Siyuan Tang, Guanhong Tao, Shiqing Ma, Xiaofeng, Wang, Haixu Tang

PDF

Open Access

TL;DR

This paper investigates the effectiveness of gradient-based backdoor detection methods, reveals their underlying mechanism, and introduces Gradient Shaping (GRASP), a novel attack enhancement that challenges existing detection techniques without compromising attack success.

Contribution

The paper provides the first analysis of why gradient-based trigger inversion works, and proposes GRASP, a new attack method that reduces change rate around triggers without losing backdoor effectiveness.

Findings

01

Existing attacks have low change rates around triggers, aiding detection.

02

GRASP can reduce change rates without affecting backdoor success.

03

Gradient Shaping does not weaken stealthy attacks against detection methods.

Abstract

Most existing methods to detect backdoored machine learning (ML) models take one of the two approaches: trigger inversion (aka. reverse engineer) and weight analysis (aka. model diagnosis). In particular, the gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques, as evidenced by the TrojAI competition, Trojan Detection Challenge and backdoorBench. However, little has been done to understand why this technique works so well and, more importantly, whether it raises the bar to the backdoor attack. In this paper, we report the first attempt to answer this question by analyzing the change rate of the backdoored model around its trigger-carrying inputs. Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs, which are easy to capture by gradient-based trigger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis