Knowledge Distillation for Large Language Models
Alejandro Paredes La Torre, Barbara Flores, Diego Rodriguez

TL;DR
This paper introduces a resource-efficient method for compressing large language models by combining knowledge distillation with guided chain-of-thought reinforcement learning, achieving high performance in smaller models across multiple languages and tasks.
Contribution
It presents a novel framework integrating knowledge distillation with chain-of-thought reinforcement learning to produce compact, high-performing language models.
Findings
Distilled models retain 70-95% of teacher performance across tasks.
Chain-of-thought reinforcement improves reasoning and correctness.
Quantization reduces memory and inference latency.
Abstract
We propose a resource-efficient framework for compressing large language models through knowledge distillation, combined with guided chain-of-thought reinforcement learning. Using Qwen 3B as the teacher and Qwen 0.5B as the student, we apply knowledge distillation across English Dolly-15k, Spanish Dolly-15k, and code BugNet and PyTorrent datasets, with hyperparameters tuned in the English setting to optimize student performance. Across tasks, the distilled student retains a substantial portion of the teacher's capability while remaining significantly smaller: 70% to 91% in English, up to 95% in Spanish, and up to 93.5% Rouge-L in code. For coding tasks, integrating chain-of-thought prompting with Group Relative Policy Optimization using CoT-annotated Codeforces data improves reasoning coherence and solution correctness compared to knowledge distillation alone. Post-training 4-bit weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
