MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models

Xueqi Cheng; Minxing Zheng; Shixiang Zhu; Yushun Dong

arXiv:2506.02362·cs.CR·June 4, 2025

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models

Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong

PDF

Open Access 1 Repo 4 Reviews

TL;DR

MISLEADER is a novel defense method against model extraction attacks that uses ensembles of distilled models and data augmentation, effectively balancing model utility and robustness without relying on out-of-distribution query assumptions.

Contribution

The paper introduces MISLEADER, a new defense framework that employs ensembles of distilled models and data augmentation to prevent model extraction without OOD query assumptions.

Findings

01

Effective in reducing model extractability in experiments

02

Maintains high predictive accuracy on benign inputs

03

Provides theoretical bounds on defense performance

Abstract

Model extraction attacks aim to replicate the functionality of a black-box model through query access, threatening the intellectual property (IP) of machine-learning-as-a-service (MLaaS) providers. Defending against such attacks is challenging, as it must balance efficiency, robustness, and utility preservation in the real-world scenario. Despite the recent advances, most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs. However, this assumption is increasingly unreliable, as modern models are trained on diverse datasets and attackers often operate under limited query budgets. As a result, the effectiveness of these defenses is significantly compromised in realistic deployment scenarios. To address this gap, we propose MISLEADER (enseMbles of dIStiLled modEls Against moDel ExtRaction), a novel…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 5

Strengths

Addresses a Valid Limitation: The paper clearly identifies a practical weakness in many existing model extraction defenses—their dependency on OOD detection. The goal of creating a defense that is robust to in-distribution queries is well-motivated from a security perspective. Strong Empirical Validation: The experimental evaluation is thorough within its defined scope. The authors compare MISLEADER against a wide array of SOTA baselines across multiple datasets (MNIST, CIFAR-10, CIFAR-100) a

Weaknesses

Significant Practicality Concerns (Overhead): The primary weakness of this paper is the practicality of the proposed solution. MISLEADER relies on an ensemble of heterogeneous models (e.g., ResNet18_8x, MobileNetV2, DenseNet121). While this architectural diversity is key to the defense, it introduces a massive computational overhead. Both training and—more critically—inference require running multiple distinct models for every single query . For a real-world MLaaS provider, where inference laten

Reviewer 02Rating 4Confidence 3

Strengths

1.Focusing on a defense that is agnostic to the query distribution is a significant and timely contribution that aligns better with real-world MLaaS deployment scenarios. 2.This paper evaluates against diverse baselines (RandP, P-poison, GRAD, MeCo, ACT, DNF) under both data-based (DBME) and data-free (DFME) attack settings, using both soft and hard labels. 3.The paper is well written and clearly structured.

Weaknesses

1.This paper mentions the range and tuning of hyperparameters, but does not analyze the impact of hyperparameters on attack performance in detail. 2.Training an ensemble of models via a bilevel optimization process is significantly more expensive than training a single model or applying a lightweight output perturbation. While the paper mentions parallel deployment, a more detailed discussion on the inference-time cost would be beneficial. 3.While data augmentation is a key part of simulating

Reviewer 03Rating 4Confidence 3

Strengths

1. The bilevel/trilevel optimization framework is mathematically coherent and provides a principled way to handle both data-based and data-free extraction. 2. The authors provide non-trivial theoretical analyses in Theorem 1 and Theorem 2 that connect defense performance with model capacity and distributional divergence. 3. The experiments are extensive and well-controlled, with multiple datasets, attacker/defender architectures, and clear ablation studies. The proposed method consistently outpe

Weaknesses

1. While the integration of data augmentation, ensemble distillation, and bilevel optimization is elegant, each element individually is well-established. The conceptual leap feels incremental rather than groundbreaking. 2. The authors are encouraged to further clarify how their proposed formulation aligns with or extends beyond the traditional OOD-based defense assumption. 3. All experiments use image classification benchmarks; no exploration is done on NLP or tabular domains, which limits gene

Reviewer 04Rating 2Confidence 4

Strengths

1. Defending against model extraction without relying on OOD-query detection is practical in real-world scenarios. 2. The optimization process is simple and intuitive to implement.

Weaknesses

1. Limited novelty: The core component, as the title suggests, to use an ensemble of models to defend against model extraction is similar to EDM [1]. While the authors claim to use data augmentation to approximate attacker queries, there is no analysis demonstrating that such augmentation truly approximates attacker queries. Moreover, the augmentation techniques used are standard practices in machine learning. 2. Incorrect highlight: In Table 1, under the DFMS-HL attack on CIFAR-100 with the clo

Code & Models

Repositories

labrai/misleader
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Image Processing and 3D Reconstruction