Knowledge Distillation for Large Language Models

Alejandro Paredes La Torre; Barbara Flores; Diego Rodriguez

arXiv:2603.13765·cs.CL·March 17, 2026

Knowledge Distillation for Large Language Models

Alejandro Paredes La Torre, Barbara Flores, Diego Rodriguez

PDF

Open Access

TL;DR

This paper introduces a resource-efficient method for compressing large language models by combining knowledge distillation with guided chain-of-thought reinforcement learning, achieving high performance in smaller models across multiple languages and tasks.

Contribution

It presents a novel framework integrating knowledge distillation with chain-of-thought reinforcement learning to produce compact, high-performing language models.

Findings

01

Distilled models retain 70-95% of teacher performance across tasks.

02

Chain-of-thought reinforcement improves reasoning and correctness.

03

Quantization reduces memory and inference latency.

Abstract

We propose a resource-efficient framework for compressing large language models through knowledge distillation, combined with guided chain-of-thought reinforcement learning. Using Qwen 3B as the teacher and Qwen 0.5B as the student, we apply knowledge distillation across English Dolly-15k, Spanish Dolly-15k, and code BugNet and PyTorrent datasets, with hyperparameters tuned in the English setting to optimize student performance. Across tasks, the distilled student retains a substantial portion of the teacher's capability while remaining significantly smaller: 70% to 91% in English, up to 95% in Spanish, and up to 93.5% Rouge-L in code. For coding tasks, integrating chain-of-thought prompting with Group Relative Policy Optimization using CoT-annotated Codeforces data improves reasoning coherence and solution correctness compared to knowledge distillation alone. Post-training 4-bit weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques