Teaching Language Models to Think in Code

Hyeon Hwang; Jiwoo Lee; Jaewoo Kang

arXiv:2605.07237·cs.CL·May 12, 2026

Teaching Language Models to Think in Code

Hyeon Hwang, Jiwoo Lee, Jaewoo Kang

PDF

TL;DR

ThinC introduces a code-centric reasoning framework for language models, replacing NL-based reasoning with code execution, leading to improved performance on math benchmarks.

Contribution

It proposes a novel approach where code acts as the primary reasoner, outperforming existing tool-integrated reasoning methods on multiple benchmarks.

Findings

01

ThinC-4B outperforms all TIR baselines on five math benchmarks.

02

99.2% of answers are grounded in interpreter output.

03

Model recovers reliably from code execution failures.

Abstract

Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather than as a tool invoked by NL. A ThinC trajectory begins with a brief NL planning step, after which all reasoning unfolds through code blocks connected only by their execution outputs. We distill 12.2k code-centric trajectories from a teacher model and train ThinC-1.7B and ThinC-4B with supervised fine-tuning followed by reinforcement learning. ThinC-4B consistently outperforms every TIR baseline on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.