Crosslingual On-Policy Self-Distillation for Multilingual Reasoning
Yihong Liu, Raoyuan Zhao, Michael A. Hedderich, Hinrich Sch\"utze

TL;DR
This paper introduces COPSD, a method that enhances multilingual reasoning in large language models by transferring high-resource language reasoning to low-resource languages through self-distillation, improving performance across 17 African languages.
Contribution
The paper presents a novel crosslingual self-distillation approach that significantly improves low-resource language reasoning in LLMs, outperforming previous methods like GRPO.
Findings
COPSD improves reasoning accuracy across 17 low-resource languages.
It enhances answer-format adherence and test-time scaling.
Code and data are publicly available at the provided GitHub link.
Abstract
Large language models (LLMs) have achieved remarkable progress in mathematical reasoning, but this ability is not equally accessible across languages. Especially low-resource languages exhibit much lower reasoning performance. To address this, we propose Crosslingual On-Policy Self-Distillation (COPSD), which transfers a model's own high-resource reasoning behavior to low-resource languages. COPSD uses the same model as student and teacher: the student sees only the low-resource problem, while the teacher receives privileged crosslingual context, including the problem translation and reference solution in English. Training minimizes full-distribution token-level divergence on the student's own rollouts, providing dense supervision while avoiding the sparsity and instability of outcome-only reinforcement learning (RL). Experiments on 17 low-resource African languages show that COPSD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
