Adaptive Group Robust Ensemble Knowledge Distillation
Patrik Kenfack, Ulrich A\"ivodji, Samira Ebrahimi Kahou

TL;DR
This paper introduces AGRE-KD, a novel ensemble knowledge distillation method that improves performance on underrepresented subgroups by selectively combining teacher models, outperforming traditional ensemble approaches.
Contribution
The paper proposes AGRE-KD, a new ensemble distillation strategy that enhances subgroup robustness by selectively integrating debiased teacher models based on gradient analysis.
Findings
AGRE-KD outperforms traditional ensemble distillation in subgroup performance.
The method effectively leverages an additional biased model to improve worst-case subgroup accuracy.
Experiments show AGRE-KD surpasses classic ensemble methods like majority voting.
Abstract
Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex teacher model to a relatively ``simple'' student model. Prior work has shown that ensemble deep learning methods can improve the performance of the worst-case subgroups; however, it is unclear if this advantage carries over when distilling knowledge from an ensemble of teachers, especially when the teacher models are debiased. This study demonstrates that traditional ensemble knowledge distillation can significantly drop the performance of the worst-case subgroups in the distilled student model even when the teacher models are debiased. To overcome this, we propose Adaptive Group Robust Ensemble Knowledge Distillation (AGRE-KD), a simple ensembling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition
MethodsKnowledge Distillation
