Post-breach Recovery: Protection against White-box Adversarial Examples   for Leaked DNN Models

Shawn Shan; Wenxin Ding; Emily Wenger; Haitao Zheng; Ben Y. Zhao

arXiv:2205.10686·cs.CR·October 18, 2022

Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models

Shawn Shan, Wenxin Ding, Emily Wenger, Haitao Zheng, Ben Y. Zhao

PDF

TL;DR

This paper introduces Neo, a system for post-breach recovery of leaked DNN models that detects and filters adversarial examples, significantly enhancing security against repeated breaches and adaptive attacks.

Contribution

Neo is a novel system that creates multiple model versions and an inference filter to detect and mitigate adversarial attacks on leaked DNNs.

Findings

01

Neo achieves high accuracy in filtering attacks across various tasks.

02

Neo provides 7-10 recoveries against repeated breaches.

03

Neo remains effective against strong adaptive attacks.

Abstract

Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker "white-box" access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon. In this paper, we consider the problem of post-breach recovery for DNN models. We propose Neo, a new system that creates new versions of leaked models, alongside an inference time filter that detects and removes adversarial examples generated on previously leaked models. The classification surfaces of different model versions are slightly offset (by introducing hidden distributions), and Neo detects the overfitting of attacks to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.