How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection
Mantas Mazeika, Bo Li, David Forsyth

TL;DR
This paper introduces GRAD², a novel, efficient defense mechanism against model stealing attacks that effectively redirects adversaries' training updates, maintaining model utility while reducing computational costs.
Contribution
The paper proposes a provably optimal gradient redirection algorithm and a coordinated defense strategy, significantly improving model stealing defenses over prior methods.
Findings
GRAD² outperforms previous defenses in accuracy and efficiency.
The method maintains high utility with low computational overhead.
Gradient redirection enables reprogramming adversaries' behavior.
Abstract
Model stealing attacks present a dilemma for public machine learning APIs. To protect financial investments, companies may be forced to withhold important information about their models that could facilitate theft, including uncertainty estimates and prediction explanations. This compromise is harmful not only to users but also to external transparency. Model stealing defenses seek to resolve this dilemma by making models harder to steal while preserving utility for benign users. However, existing defenses have poor performance in practice, either requiring enormous computational overheads or severe utility trade-offs. To meet these challenges, we present a new approach to model stealing defenses called gradient redirection. At the core of our approach is a provably optimal, efficient algorithm for steering an adversary's training updates in a targeted manner. Combined with improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)
