TL;DR
This paper introduces a pipeline for generating verified reasoning traces for code, improving language models' ability to reason about and generate code by training on verified, execution-trace-checked rationales.
Contribution
It creates a large dataset of verified, execution-trace-based rationales for training models, leading to significant performance improvements in code reasoning and generation.
Findings
Models trained on verified data outperform baselines significantly.
The pipeline generates 54,000 verified rationales for training.
Verification quality correlates with reasoning and code generation performance.
Abstract
Getting language models to reason correctly about code requires training on data where each reasoning step can be checked. Current synthetic Chain-of-Thought (CoT) training data often consists of plausible-sounding explanations generated by teacher models, and not verifiable accounts of actual program behavior. Models trained on such data learn logically flawed reasoning patterns despite syntactic correctness. To address this, we build a pipeline that generates execution-trace-verified CoT rationales by instrumenting code to capture traces, narrating them into natural language, and cross-checking each narration against the original trace. We systematically create 54,000 verified, bi-directional rationales that teach models to reason both forward (inputoutput) and backward (outputinput). Models fine-tuned on our verified data achieve substantial improvements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
