TL;DR
This paper introduces 'Adaptive Misinformation', a defense mechanism against model stealing attacks that selectively provides incorrect outputs for OOD queries, significantly reducing attack success while maintaining high accuracy for legitimate users.
Contribution
It proposes a novel defense strategy that exploits the OOD query pattern of attacks to degrade clone model accuracy with minimal impact on benign users.
Findings
Reduces attacker's clone accuracy by up to 40%.
Maintains benign user accuracy within 0.5%.
Outperforms existing defenses in security vs. accuracy trade-off.
Abstract
Deep Neural Networks (DNNs) are susceptible to model stealing attacks, which allows a data-limited adversary with no knowledge of the training dataset to clone the functionality of a target model, just by using black-box query access. Such attacks are typically carried out by querying the target model using inputs that are synthetically generated or sampled from a surrogate dataset to construct a labeled dataset. The adversary can use this labeled dataset to train a clone model, which achieves a classification accuracy comparable to that of the target model. We propose "Adaptive Misinformation" to defend against such model stealing attacks. We identify that all existing model stealing attacks invariably query the target model with Out-Of-Distribution (OOD) inputs. By selectively sending incorrect predictions for OOD queries, our defense substantially degrades the accuracy of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Defending Against Model Stealing Attacks With Adaptive Misinformation· youtube
