Multi-Granularity Semantic Revision for Large Language Model Distillation
Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen,, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

TL;DR
This paper introduces a multi-granularity semantic revision approach for large language model distillation, addressing errors and misalignments in output by correcting sequences, adaptively focusing on semantic-rich outputs, and enforcing span-level probability consistency.
Contribution
It proposes a novel multi-level semantic revision framework incorporating sequence correction, adaptive loss, and span correlation constraints for improved LLM distillation.
Findings
Outperforms existing distillation methods across various model sizes.
Reduces generation errors and enhances semantic transfer.
Demonstrates effectiveness on models from 0.1B to 13B parameters.
Abstract
Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art struggle to align the most informative part due to the complex distribution of LLMs' outputs. To address these problems, we propose a multi-granularity semantic revision method for LLM distillation. At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy. SCRG first calculates the semantic cognitive difference between the teacher and student to detect the error token, then corrects it with the teacher-generated one, and re-generates the sequence to reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsALIGN
