Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization
Dipto Sumit, Ankan Kumar Roy, Sadia Khair Rodela, Atia Haque Asha, Mourchona Afrin, Niloy Farhan, Farig Yousuf Sadeque

TL;DR
This paper introduces a reliability-aware multi-teacher distillation framework for low-resource abstractive summarization, demonstrating improved performance and insights into when multi-teacher supervision is most effective.
Contribution
It proposes EWAD and CPDP mechanisms for more reliable distillation, and provides extensive experiments across languages and datasets to analyze their effectiveness.
Findings
Logit-level KD yields the most reliable gains.
Complex distillation improves short but not long summaries.
Cross-lingual pseudo-label KD retains high ROUGE scores.
Abstract
We study multiteacher knowledge distillation for low resource abstractive summarization from a reliability aware perspective. We introduce EWAD (Entropy Weighted Agreement Aware Distillation), a token level mechanism that routes supervision between teacher distillation and gold supervision based on inter teacher agreement, and CPDP (Capacity Proportional Divergence Preservation), a geometric constraint on the student position relative to heterogeneous teachers. Across two Bangla datasets, 13 BanglaT5 ablations, and eight Qwen2.5 experiments, we find that logit level KD provides the most reliable gains, while more complex distillation improves semantic similarity for short summaries but degrades longer outputs. Cross lingual pseudo label KD across ten languages retains 71-122 percent of teacher ROUGE L at 3.2x compression. A human validated multi judge LLM evaluation further reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
