FDINet: Protecting against DNN Model Extraction via Feature Distortion Index
Hongwei Yao, Zheng Li, Haiqin Weng, Feng Xue, Zhan Qin, Kui Ren

TL;DR
FDINET is a novel defense mechanism that detects model extraction attacks by analyzing feature distribution deviations in adversary queries, achieving high accuracy and efficiency across multiple datasets and attack types.
Contribution
Introduces Feature Distortion Index (FDI) and FDINET, a new method for detecting and identifying colluding adversaries in DNN model extraction attacks.
Findings
Achieves 100% detection accuracy on certain attacks
Detects extraction with just 50 queries at 96.08% confidence
Identifies colluding adversaries with over 91% accuracy
Abstract
Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
Methodstravel james
