Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Yecheng Wu; Song Han; Hai Cai

arXiv:2604.13010·cs.LG·May 11, 2026

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Yecheng Wu, Song Han, Hai Cai

PDF

1 Repo

TL;DR

Lightning OPD introduces an offline on-policy distillation method for large language models that enforces teacher consistency, achieving comparable performance to traditional methods with significantly higher training efficiency.

Contribution

It proposes Lightning OPD, a novel offline distillation framework that eliminates the need for a live teacher server by ensuring teacher consistency during training.

Findings

01

Lightning OPD achieves 4.0x higher training efficiency than standard OPD.

02

It reaches 69.9% on AIME 2024 with just 30 GPU hours starting from SFT.

03

The method scales to MoE architectures, training Qwen3-30B-A3B to 71.0% on AIME 2024.

Abstract

On-policy distillation (OPD) is an effective post-training paradigm for large language models but requires a live teacher server throughout training, resulting in substantial infrastructure overhead. We investigate whether OPD can be performed offline by precomputing teacher log-probabilities once over SFT rollouts and reusing them during training. We find that naively doing so fails to reliably match standard OPD, and trace the root cause to a previously overlooked condition we term teacher consistency, requiring that the same teacher be used for both supervised fine-tuning and OPD. Violating this condition introduces a gradient bias that degrades performance for both offline and online OPD. Building on this insight, we propose Lightning OPD, an offline on-policy distillation framework that enforces teacher consistency and eliminates the need for a live teacher server entirely. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jet-ai-projects/Lightning-OPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.