Model Reconstruction from Model Explanations
Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt

TL;DR
This paper demonstrates that gradient-based explanations can be used to efficiently reconstruct models, revealing a tension between model secrecy and interpretability, supported by theoretical algorithms and practical heuristics.
Contribution
It introduces a provable algorithm for reconstructing two-layer ReLU networks from gradient queries and provides heuristics that outperform existing reconstruction attacks.
Findings
Gradient explanations can reveal models quickly.
The proposed algorithm is nearly optimal and query-efficient.
Heuristics significantly improve reconstruction efficiency.
Abstract
We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations. On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and nearly optimal in its dependence on the model size. Of interest not only from a learning-theoretic perspective, this result highlights the power of gradients rather than labels as a learning primitive. Complementing our theory, we give effective heuristics for reconstructing models from gradient explanations that are orders of magnitude more query-efficient than reconstruction attacks relying on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia?
