GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation
Wenjie Zhou, Zhenxin Ding, Xiaodong Zhang, Haibo Shi, Junfeng Wang,, Dawei Yin

TL;DR
GOVERN is a novel ensemble algorithm that effectively combines multiple teacher models during unsupervised distillation, significantly improving student model performance with minimal inference cost in question-answering systems.
Contribution
We introduce GOVERN, a new method for ensemble knowledge distillation from multiple teachers without labels, achieving high performance with low computational overhead.
Findings
GOVERN achieves 99.5% of teacher ensemble performance.
Requires only 1% of the ensemble inference budget.
Successfully deployed in a real-world QA system.
Abstract
Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance while operating under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student model performance, how can knowledge from multiple teacher models be effectively ensemble during this stage without the guidance of labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments, enabling the student model to achieve results comparable to that of teacher ensembles. Our experiments show that GOVERN remarkably requires a mere 1\% of the ensemble method's inference budget to achieve 99.5\% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Advanced Manufacturing and Logistics Optimization
MethodsKnowledge Distillation
