Making Qwen3 Think in Korean with Reinforcement Learning

Jungyup Lee; Jemin Kim; Sang Park; SeungJae Lee

arXiv:2508.10355·cs.CL·August 15, 2025

Making Qwen3 Think in Korean with Reinforcement Learning

Jungyup Lee, Jemin Kim, Sang Park, SeungJae Lee

PDF

3 Models

TL;DR

This paper introduces a two-stage fine-tuning method for Qwen3 14B to enhance its Korean reasoning abilities using supervised learning and reinforcement learning with a novel stability mechanism, achieving superior performance in Korean reasoning tasks.

Contribution

The paper presents a novel two-stage fine-tuning approach incorporating reinforcement learning with an oracle judge to improve Korean reasoning in large language models, addressing stability issues.

Findings

01

Significant improvement in Korean reasoning benchmarks.

02

Enhanced problem-solving in math and coding tasks.

03

Stable reinforcement learning training with the oracle judge.

Abstract

We present a two-stage fine-tuning approach to make the large language model Qwen3 14B "think" natively in Korean. In the first stage, supervised fine-tuning (SFT) on a high-quality Korean reasoning dataset establishes a strong foundation in Korean logical reasoning, yielding notable improvements in Korean-language tasks and even some gains in general reasoning ability. In the second stage, we employ reinforcement learning with a customized Group Relative Policy Optimization (GRPO) algorithm to further enhance both Korean reasoning alignment and overall problem-solving performance. We address critical stability challenges in GRPO training - such as reward hacking and policy collapse - by introducing an oracle judge model that calibrates the reward signal. Our approach achieves stable learning (avoiding the collapse observed in naive GRPO) and leads to steady, incremental performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.