Debiased Distillation by Transplanting the Last Layer
Jiwoon Lee, Jaeho Lee

TL;DR
This paper introduces DeTT, a simple yet effective knowledge distillation method that debiases student models by transplanting the teacher's last layer and reweighting samples, improving worst-group accuracy without needing bias annotations.
Contribution
DeTT is a novel distillation approach that leverages last-layer transplanting and feature matching to mitigate dataset bias during post-processing.
Findings
DeTT outperforms baseline methods in worst-group accuracy.
DeTT effectively debiases student models without requiring bias annotations.
The last layer plays a crucial role in debiasing during distillation.
Abstract
Deep models are susceptible to learning spurious correlations, even during the post-processing. We take a closer look at the knowledge distillation -- a popular post-processing technique for model compression -- and find that distilling with biased training data gives rise to a biased student, even when the teacher is debiased. To address this issue, we propose a simple knowledge distillation algorithm, coined DeTT (Debiasing by Teacher Transplanting). Inspired by a recent observation that the last neural net layer plays an overwhelmingly important role in debiasing, DeTT directly transplants the teacher's last layer to the student. Remaining layers are distilled by matching the feature map outputs of the student and the teacher, where the samples are reweighted to mitigate the dataset bias. Importantly, DeTT does not rely on the availability of extensive annotations on the bias-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
MethodsKnowledge Distillation
