Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models
Shawn Shan, Wenxin Ding, Emily Wenger, Haitao Zheng, Ben Y. Zhao

TL;DR
This paper introduces Neo, a system for post-breach recovery of leaked DNN models that detects and filters adversarial examples, significantly enhancing security against repeated breaches and adaptive attacks.
Contribution
Neo is a novel system that creates multiple model versions and an inference filter to detect and mitigate adversarial attacks on leaked DNNs.
Findings
Neo achieves high accuracy in filtering attacks across various tasks.
Neo provides 7-10 recoveries against repeated breaches.
Neo remains effective against strong adaptive attacks.
Abstract
Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker "white-box" access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon. In this paper, we consider the problem of post-breach recovery for DNN models. We propose Neo, a new system that creates new versions of leaked models, alongside an inference time filter that detects and removes adversarial examples generated on previously leaked models. The classification surfaces of different model versions are slightly offset (by introducing hidden distributions), and Neo detects the overfitting of attacks to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
