SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

Haiduo Huang; Jiangcheng Song; Yadong Zhang; Pengju Ren

arXiv:2510.24021·cs.CL·November 18, 2025

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

Haiduo Huang, Jiangcheng Song, Yadong Zhang, Pengju Ren

PDF

Open Access

TL;DR

SelecTKD introduces a selective, token-weighted knowledge distillation method that improves the training of compact LLMs by focusing on high-confidence tokens, leading to state-of-the-art results without architectural changes.

Contribution

The paper proposes SelecTKD, a novel selective distillation framework that dynamically chooses tokens for supervision, enhancing LLM compression efficiency and stability.

Findings

01

Consistently improves baseline models across multiple tasks.

02

Achieves state-of-the-art results for small models.

03

Works with on- and off-policy data without architectural changes.

Abstract

Knowledge distillation (KD) is a standard route to compress Large Language Models (LLMs) into compact students, yet most pipelines uniformly apply token-wise loss regardless of teacher confidence. This indiscriminate supervision amplifies noisy, high-entropy signals and is especially harmful under large teacher-student capacity gaps. We introduce SelecTKD, a plug-and-play Selective Token-Weighted distillation framework that shifts the focus from "how to measure divergence" to "where to apply learning". At each step, the student proposes tokens that are verified by the teacher through a robust propose-and-verify procedure with two variants: greedy Top-k and non-greedy Spec-k. Accepted tokens receive full loss, while rejected tokens are masked or down-weighted. This objective-agnostic design works with on- and off-policy data, induces an implicit curriculum quantified by Token Acceptance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification