Embarrassingly Simple Self-Distillation Improves Code Generation

Ruixiang Zhang; Richard He Bai; Huangjie Zheng; Navdeep Jaitly; Ronan Collobert; Yizhe Zhang

arXiv:2604.01193·cs.CL·April 2, 2026

Embarrassingly Simple Self-Distillation Improves Code Generation

Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang

PDF

1 Repo 7 Models 5 Datasets

TL;DR

This paper demonstrates that simple self-distillation, involving sampling and fine-tuning, significantly enhances large language models' code generation capabilities without external verification or reinforcement learning.

Contribution

It introduces a straightforward self-distillation method that improves code generation across multiple models and scales, revealing insights into decoding dynamics.

Findings

01

SSD improves pass@1 from 42.4% to 55.3% on LiveCodeBench v6.

02

Gains are concentrated on harder problems and generalize across models and scales.

03

SSD reshapes token distributions, balancing exploration and precision.

Abstract

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple/ml-ssd
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.