Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model
Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali, Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

TL;DR
This paper introduces a novel knowledge distillation approach using a committee of foundation and complementary teachers, called DiverseDistill, to effectively transfer knowledge to specialized application models, improving their performance.
Contribution
It proposes a teaching committee framework with diverse teachers and a new distillation method, DiverseDistill, to bridge the gap between foundation and specialized models.
Findings
Complementary teachers improve student performance.
DiverseDistill outperforms baseline distillation methods.
Enhanced knowledge transfer across model disparities.
Abstract
Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient for serving. Techniques from knowledge distillation may be applied here, where the application model learns to mimic the foundation model. However, specialized application models and foundation models have substantial gaps in capacity, employing distinct architectures, using different input features from different modalities, and being optimized on different distributions. These differences in model characteristics lead to significant challenges for distillation methods. In this work, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Teaching and Learning Programming · Online Learning and Analytics
MethodsKnowledge Distillation
