Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment

Xiandong Meng; Yan Wu; Yexin Tian; Xin Hu; Tianze Kang; Junliang Du

arXiv:2507.15198·cs.CL·July 22, 2025

Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment

Xiandong Meng, Yan Wu, Yexin Tian, Xin Hu, Tianze Kang, Junliang Du

PDF

TL;DR

This paper introduces a multi-teacher guided distillation approach that enhances the efficiency and performance of small language models, making deployment faster and more resource-efficient without sacrificing understanding or generation quality.

Contribution

It proposes a novel collaborative distillation strategy with multiple teachers, integrating their outputs and features to improve student model performance and generalization.

Findings

01

Student models achieve lower perplexity and higher quality in text generation.

02

The method outperforms existing distillation approaches in multiple evaluation metrics.

03

Enhanced semantic understanding and task adaptability demonstrated across various NLP tasks.

Abstract

This paper addresses the challenges of high computational cost and slow inference in deploying large language models. It proposes a distillation strategy guided by multiple teacher models. The method constructs several teacher models and integrates their output probability distributions and intermediate semantic features. This guides the student model to learn from multiple sources of knowledge. As a result, the student model gains stronger language understanding and generation ability while maintaining a small parameter size. To achieve this, the paper introduces a weighted output fusion mechanism, a feature alignment loss function, and an entropy-driven dynamic teacher weighting strategy. These components improve the quality and stability of knowledge transfer during distillation. Under multi-teacher guidance, the student model captures semantic information more effectively and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.