Adaptive Group Robust Ensemble Knowledge Distillation

Patrik Kenfack; Ulrich A\"ivodji; Samira Ebrahimi Kahou

arXiv:2411.14984·cs.LG·November 11, 2025

Adaptive Group Robust Ensemble Knowledge Distillation

Patrik Kenfack, Ulrich A\"ivodji, Samira Ebrahimi Kahou

PDF

Open Access

TL;DR

This paper introduces AGRE-KD, a novel ensemble knowledge distillation method that improves performance on underrepresented subgroups by selectively combining teacher models, outperforming traditional ensemble approaches.

Contribution

The paper proposes AGRE-KD, a new ensemble distillation strategy that enhances subgroup robustness by selectively integrating debiased teacher models based on gradient analysis.

Findings

01

AGRE-KD outperforms traditional ensemble distillation in subgroup performance.

02

The method effectively leverages an additional biased model to improve worst-case subgroup accuracy.

03

Experiments show AGRE-KD surpasses classic ensemble methods like majority voting.

Abstract

Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex teacher model to a relatively ``simple'' student model. Prior work has shown that ensemble deep learning methods can improve the performance of the worst-case subgroups; however, it is unclear if this advantage carries over when distilling knowledge from an ensemble of teachers, especially when the teacher models are debiased. This study demonstrates that traditional ensemble knowledge distillation can significantly drop the performance of the worst-case subgroups in the distilled student model even when the teacher models are debiased. To overcome this, we propose Adaptive Group Robust Ensemble Knowledge Distillation (AGRE-KD), a simple ensembling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Face and Expression Recognition

MethodsKnowledge Distillation