Generating Verifiable Chain of Thoughts from Exection-Traces

Shailja Thakur; Vaibhav Saxena; Rohan Kulkarni; Shivdeep Singh; Parameswaran Selvam; Hima Patel; Hiroshi Kanayama

arXiv:2512.00127·cs.SE·April 28, 2026

Generating Verifiable Chain of Thoughts from Exection-Traces

Shailja Thakur, Vaibhav Saxena, Rohan Kulkarni, Shivdeep Singh, Parameswaran Selvam, Hima Patel, Hiroshi Kanayama

PDF

1 Repo

TL;DR

This paper introduces a pipeline for generating verified reasoning traces for code, improving language models' ability to reason about and generate code by training on verified, execution-trace-checked rationales.

Contribution

It creates a large dataset of verified, execution-trace-based rationales for training models, leading to significant performance improvements in code reasoning and generation.

Findings

01

Models trained on verified data outperform baselines significantly.

02

The pipeline generates 54,000 verified rationales for training.

03

Verification quality correlates with reasoning and code generation performance.

Abstract

Getting language models to reason correctly about code requires training on data where each reasoning step can be checked. Current synthetic Chain-of-Thought (CoT) training data often consists of plausible-sounding explanations generated by teacher models, and not verifiable accounts of actual program behavior. Models trained on such data learn logically flawed reasoning patterns despite syntactic correctness. To address this, we build a pipeline that generates execution-trace-verified CoT rationales by instrumenting code to capture traces, narrating them into natural language, and cross-checking each narration against the original trace. We systematically create 54,000 verified, bi-directional rationales that teach models to reason both forward (input $\to$ output) and backward (output $\to$ input). Models fine-tuned on our verified data achieve substantial improvements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/verified-code-cot
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.