Teaching Language Models to Think in Code
Hyeon Hwang, Jiwoo Lee, Jaewoo Kang

TL;DR
ThinC introduces a code-centric reasoning framework for language models, replacing NL-based reasoning with code execution, leading to improved performance on math benchmarks.
Contribution
It proposes a novel approach where code acts as the primary reasoner, outperforming existing tool-integrated reasoning methods on multiple benchmarks.
Findings
ThinC-4B outperforms all TIR baselines on five math benchmarks.
99.2% of answers are grounded in interpreter output.
Model recovers reliably from code execution failures.
Abstract
Tool-integrated reasoning (TIR) has emerged as a dominant paradigm for mathematical problem solving in language models, combining natural language (NL) reasoning with code execution. However, this interleaved setup has three key limitations: code often acts as a post-hoc verifier, intermediate NL computations are error-prone, and NL and code play overlapping rather than clearly distinct roles. We propose ThinC (Thinking in Code), a framework in which code itself serves as the reasoner rather than as a tool invoked by NL. A ThinC trajectory begins with a brief NL planning step, after which all reasoning unfolds through code blocks connected only by their execution outputs. We distill 12.2k code-centric trajectories from a teacher model and train ThinC-1.7B and ThinC-4B with supervised fine-tuning followed by reinforcement learning. ThinC-4B consistently outperforms every TIR baseline on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
