MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models
Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong

TL;DR
MISLEADER is a novel defense method against model extraction attacks that uses ensembles of distilled models and data augmentation, effectively balancing model utility and robustness without relying on out-of-distribution query assumptions.
Contribution
The paper introduces MISLEADER, a new defense framework that employs ensembles of distilled models and data augmentation to prevent model extraction without OOD query assumptions.
Findings
Effective in reducing model extractability in experiments
Maintains high predictive accuracy on benign inputs
Provides theoretical bounds on defense performance
Abstract
Model extraction attacks aim to replicate the functionality of a black-box model through query access, threatening the intellectual property (IP) of machine-learning-as-a-service (MLaaS) providers. Defending against such attacks is challenging, as it must balance efficiency, robustness, and utility preservation in the real-world scenario. Despite the recent advances, most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs. However, this assumption is increasingly unreliable, as modern models are trained on diverse datasets and attackers often operate under limited query budgets. As a result, the effectiveness of these defenses is significantly compromised in realistic deployment scenarios. To address this gap, we propose MISLEADER (enseMbles of dIStiLled modEls Against moDel ExtRaction), a novel…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
Addresses a Valid Limitation: The paper clearly identifies a practical weakness in many existing model extraction defenses—their dependency on OOD detection. The goal of creating a defense that is robust to in-distribution queries is well-motivated from a security perspective. Strong Empirical Validation: The experimental evaluation is thorough within its defined scope. The authors compare MISLEADER against a wide array of SOTA baselines across multiple datasets (MNIST, CIFAR-10, CIFAR-100) a
Significant Practicality Concerns (Overhead): The primary weakness of this paper is the practicality of the proposed solution. MISLEADER relies on an ensemble of heterogeneous models (e.g., ResNet18_8x, MobileNetV2, DenseNet121). While this architectural diversity is key to the defense, it introduces a massive computational overhead. Both training and—more critically—inference require running multiple distinct models for every single query . For a real-world MLaaS provider, where inference laten
1.Focusing on a defense that is agnostic to the query distribution is a significant and timely contribution that aligns better with real-world MLaaS deployment scenarios. 2.This paper evaluates against diverse baselines (RandP, P-poison, GRAD, MeCo, ACT, DNF) under both data-based (DBME) and data-free (DFME) attack settings, using both soft and hard labels. 3.The paper is well written and clearly structured.
1.This paper mentions the range and tuning of hyperparameters, but does not analyze the impact of hyperparameters on attack performance in detail. 2.Training an ensemble of models via a bilevel optimization process is significantly more expensive than training a single model or applying a lightweight output perturbation. While the paper mentions parallel deployment, a more detailed discussion on the inference-time cost would be beneficial. 3.While data augmentation is a key part of simulating
1. The bilevel/trilevel optimization framework is mathematically coherent and provides a principled way to handle both data-based and data-free extraction. 2. The authors provide non-trivial theoretical analyses in Theorem 1 and Theorem 2 that connect defense performance with model capacity and distributional divergence. 3. The experiments are extensive and well-controlled, with multiple datasets, attacker/defender architectures, and clear ablation studies. The proposed method consistently outpe
1. While the integration of data augmentation, ensemble distillation, and bilevel optimization is elegant, each element individually is well-established. The conceptual leap feels incremental rather than groundbreaking. 2. The authors are encouraged to further clarify how their proposed formulation aligns with or extends beyond the traditional OOD-based defense assumption. 3. All experiments use image classification benchmarks; no exploration is done on NLP or tabular domains, which limits gene
1. Defending against model extraction without relying on OOD-query detection is practical in real-world scenarios. 2. The optimization process is simple and intuitive to implement.
1. Limited novelty: The core component, as the title suggests, to use an ensemble of models to defend against model extraction is similar to EDM [1]. While the authors claim to use data augmentation to approximate attacker queries, there is no analysis demonstrating that such augmentation truly approximates attacker queries. Moreover, the augmentation techniques used are standard practices in machine learning. 2. Incorrect highlight: In Table 1, under the DFMS-HL attack on CIFAR-100 with the clo
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Image Processing and 3D Reconstruction
