Hybrid Policy Distillation for LLMs

Wenhong Zhu; Ruobing Xie; Rui Wang; Pengfei Liu

arXiv:2604.20244·cs.CL·April 23, 2026

Hybrid Policy Distillation for LLMs

Wenhong Zhu, Ruobing Xie, Rui Wang, Pengfei Liu

PDF

1 Repo 2 Models

TL;DR

This paper introduces Hybrid Policy Distillation (HPD), a novel method that combines forward and reverse KL for more efficient and stable knowledge distillation of large language models across various tasks.

Contribution

It presents a unified view of knowledge distillation, reformulates it as a token-level log-likelihood, and proposes HPD to improve model performance and efficiency.

Findings

01

HPD improves optimization stability and performance.

02

HPD achieves better mode coverage and mode-seeking balance.

03

Code is available at https://github.com/zwhong714/Hybrid-Policy-Distillation.

Abstract

Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and data regime. We break down the design of existing KD methods and present a unified view that establishes connections between them, reformulating KD as a reweighted log-likelihood objective at the token level. We further propose Hybrid Policy Distillation (HPD), which integrates the complementary advantages of forward and reverse KL to balance mode coverage and mode-seeking, and combines off-policy data with lightweight, approximate on-policy sampling. We validate HPD on long-generation math reasoning as well as short-generation dialogue and code tasks, demonstrating improved optimization stability, computational efficiency, and final performance across diverse model families and scales. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zwhong714/Hybrid-Policy-Distillation
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.