Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

Yihong Liu; Raoyuan Zhao; Michael A. Hedderich; Hinrich Sch\"utze

arXiv:2605.09548·cs.CL·May 12, 2026

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

Yihong Liu, Raoyuan Zhao, Michael A. Hedderich, Hinrich Sch\"utze

PDF

1 Repo 2 Datasets

TL;DR

This paper introduces COPSD, a method that enhances multilingual reasoning in large language models by transferring high-resource language reasoning to low-resource languages through self-distillation, improving performance across 17 African languages.

Contribution

The paper presents a novel crosslingual self-distillation approach that significantly improves low-resource language reasoning in LLMs, outperforming previous methods like GRPO.

Findings

01

COPSD improves reasoning accuracy across 17 low-resource languages.

02

It enhances answer-format adherence and test-time scaling.

03

Code and data are publicly available at the provided GitHub link.

Abstract

Large language models (LLMs) have achieved remarkable progress in mathematical reasoning, but this ability is not equally accessible across languages. Especially low-resource languages exhibit much lower reasoning performance. To address this, we propose Crosslingual On-Policy Self-Distillation (COPSD), which transfers a model's own high-resource reasoning behavior to low-resource languages. COPSD uses the same model as student and teacher: the student sees only the low-resource problem, while the teacher receives privileged crosslingual context, including the problem translation and reference solution in English. Training minimizes full-distribution token-level divergence on the student's own rollouts, providing dense supervision while avoiding the sparsity and instability of outcome-only reinforcement learning (RL). Experiments on 17 low-resource African languages show that COPSD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cisnlp/COPSD
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.