Flipping Knowledge Distillation: Leveraging Small Models' Expertise to Enhance LLMs in Text Matching
Mingzhe Li, Jing Xiang, Qishen Zhang, Kaiyang Wan, Xiuying Chen

TL;DR
This paper introduces a flipped knowledge distillation method where large language models learn from smaller, specialized models to improve text matching, addressing architectural differences and enhancing performance in domain-specific tasks.
Contribution
The paper proposes a novel flipped knowledge distillation paradigm using encoder reinterpretation and Margin-aware Contrastive Learning to leverage small models' expertise for large language models.
Findings
Improved performance on financial and healthcare benchmarks.
Effective handling of positive and negative pair similarities.
Successful deployment in real-world online applications.
Abstract
Knowledge distillation typically involves transferring knowledge from a Large Language Model (LLM) to a Smaller Language Model (SLM). However, in tasks such as text matching, fine-tuned smaller models often yield more effective domain-specific representations, as they focus on optimizing the similarity of input pairs. To leverage both the specialized strengths of small models and the rich semantic understanding of LLMs, we introduce a flipped knowledge distillation paradigm, where LLM learns from SLM. Specifically, we address the architectural gap between decoder-only LLMs and smaller encoder-based models by reinterpreting LLMs in an encoder-decoder manner using LoRA. The encoder generates compressed representations, while the decoder maps them to the output space. During training, the encoder produces representations and their similarities, which are then aligned with the similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Advanced Graph Neural Networks
