Towards Undistillable Models by Minimizing Conditional Mutual Information
Linfeng Ye, Shayan Mohajer Hamidi, En-hui Yang

TL;DR
This paper introduces a novel training method called CMIM that minimizes conditional mutual information to produce deep neural networks that are resistant to knowledge distillation, thereby protecting model intellectual property.
Contribution
The paper proposes a new training approach that effectively creates undistillable DNNs by jointly minimizing cross entropy and conditional mutual information across clusters.
Findings
CMIM models are undistillable by existing KD methods.
CMIM models outperform standard CE-trained models in accuracy.
Clusters in CMIM models are highly concentrated, making them resistant to distillation.
Abstract
A deep neural network (DNN) is said to be undistillable if, when used as a black-box input-output teacher, it cannot be distilled through knowledge distillation (KD). In this case, the distilled student (referred to as the knockoff student) does not outperform a student trained independently with label smoothing (LS student) in terms of prediction accuracy. To protect intellectual property of DNNs, it is desirable to build undistillable DNNs. To this end, it is first observed that an undistillable DNN may have the trait that each cluster of its output probability distributions in response to all sample instances with the same label should be highly concentrated to the extent that each cluster corresponding to each label should ideally collapse into one probability distribution. Based on this observation and by measuring the concentration of each cluster in terms of conditional mutual…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The paper's goal is to prevent the misuse of models and serves a partial privacy protection technique, which is significant for the reliable use of AI models. 2. The paper provides both theoretical and empirical evidence to demonstrate the benefits of the proposed method in enhancing the undistillability of models. T
1. The writing quality of the paper could be enhanced. 2. From a methodological perspective, the overall contribution of the paper is somewhat limited. The paper utilizes an existing metric, CMI, to measure the compactness of model outputs and aims to enhance model undistillability by maximizing this compactness metric. This approach appears too trivial and straightforward. It is not clear how this method fundamentally differs from directly employing a maximum entropy term or label smoothing te
1. The idea is intuitive and with sufficient details. 2. The benchmark defense and knowledge distillation (attack) methods are exhaustive in the experiments. 3. The paper is well-organized and easy to follow.
1. The discussions of the proposed method’s limitations are missing. The proposed method collapses the logits so that each class’s output is highly concentrated (as shown in Fig.2), the teacher model might become overly confident in its predictions. This can lead to poor calibration and deteriorate generalization capability on out-of-distribution (OoD) data. Therefore, more settings and evaluations on the protected teacher model’s performance beyond prediction accuracy are necessary. 2. The pro
- The topic of undistillable models is highly relevant, particularly given the growing online prevalence of large closed-source models. - The paper is mostly well-written with mostly appropriately supported claims. - The authors provide a nice balance between theoretical and empirical results.
- L040: Missing reference to theoretical paper by Borup and Andersen (2021), “Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation,” NeurIPS. - *"An insight is provided that in order for a DNN to be undistillable, it is desirable for the DNN to possess the trait that each cluster of the DNN’s output probability distributions corresponding to each label is highly concentrated to the extent that all probability distributions within the cluster m
* The paper introduces a novel objective based on conditional mutual information that includes optimizing over a power-transformed probability distribution. * Approximating the intractable terms of the objective is original although I am not sure that it is justified. I validated Theorem 4.1. * I am not an expert in this field, but the experimental part seems very comprehensive in terms of datasets, student and teacher networks, defense strategies, and compared methods. * The proposed approach s
* I believe that further justification, evidence, or analysis (theoretical or empirical) is required to relate the approximation of the second term in the objective to the original one (as $\omega$ was taken to be a finite number). There is some discrepancy that needs to be settled as eventually instead of maximizing over $\mathbf{\alpha}$ (which makes sense), averaging is done over multiple values. Also, can you please share what values of $\omega$ were used in the paper? I didn't find this inf
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
