HoneypotNet: Backdoor Attacks Against Model Extraction
Yixu Wang, Tianle Gu, Yan Teng, Yingchun Wang, Xingjun Ma

TL;DR
This paper introduces HoneypotNet, a novel lightweight backdoor attack that poisons model outputs to defend against model extraction attacks, effectively deterring malicious users while maintaining model performance.
Contribution
HoneypotNet is a new backdoor method that modifies model outputs to poison substitute models, providing an attack-as-defense paradigm against model extraction.
Findings
High success rate in injecting backdoors into substitute models
Disrupts functionality of extracted models effectively
Works across four benchmark datasets
Abstract
Model extraction attacks are one type of inference-time attacks that approximate the functionality and performance of a black-box victim model by launching a certain number of queries to the model and then leveraging the model's predictions to train a substitute model. These attacks pose severe security threats to production models and MLaaS platforms and could cause significant monetary losses to the model owners. A body of work has proposed to defend machine learning models against model extraction attacks, including both active defense methods that modify the model's outputs or increase the query overhead to avoid extraction and passive defense methods that detect malicious queries or leverage watermarks to perform post-verification. In this work, we introduce a new defense paradigm called attack as defense which modifies the model's output to be poisonous such that any malicious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning in Healthcare
