Multi-Granularity Semantic Revision for Large Language Model   Distillation

Xiaoyu Liu; Yun Zhang; Wei Li; Simiao Li; Xudong Huang; Hanting Chen,; Yehui Tang; Jie Hu; Zhiwei Xiong; Yunhe Wang

arXiv:2407.10068·cs.CL·July 16, 2024

Multi-Granularity Semantic Revision for Large Language Model Distillation

Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen,, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

PDF

Open Access

TL;DR

This paper introduces a multi-granularity semantic revision approach for large language model distillation, addressing errors and misalignments in output by correcting sequences, adaptively focusing on semantic-rich outputs, and enforcing span-level probability consistency.

Contribution

It proposes a novel multi-level semantic revision framework incorporating sequence correction, adaptive loss, and span correlation constraints for improved LLM distillation.

Findings

01

Outperforms existing distillation methods across various model sizes.

02

Reduces generation errors and enhances semantic transfer.

03

Demonstrates effectiveness on models from 0.1B to 13B parameters.

Abstract

Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art struggle to align the most informative part due to the complex distribution of LLMs' outputs. To address these problems, we propose a multi-granularity semantic revision method for LLM distillation. At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy. SCRG first calculates the semantic cognitive difference between the teacher and student to detect the error token, then corrects it with the teacher-generated one, and re-generates the sequence to reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsALIGN