NExT: Teaching Large Language Models to Reason about Code Execution

Ansong Ni; Miltiadis Allamanis; Arman Cohan; Yinlin Deng; Kensen Shi,; Charles Sutton; Pengcheng Yin

arXiv:2404.14662·cs.LG·April 24, 2024·6 cites

NExT: Teaching Large Language Models to Reason about Code Execution

Ansong Ni, Miltiadis Allamanis, Arman Cohan, Yinlin Deng, Kensen Shi,, Charles Sutton, Pengcheng Yin

PDF

Open Access

TL;DR

NExT enhances large language models' ability to understand and reason about program execution by training them on execution traces, significantly improving their performance on program repair tasks.

Contribution

The paper introduces NExT, a self-training method that teaches LLMs to reason about code execution using synthetic, execution-aware rationales, improving their debugging capabilities.

Findings

01

NExT improves fix rates by 26.1% on MBPP and 14.3% on HumanEval.

02

NExT enhances rationale quality as per automated metrics and human raters.

03

The model generalizes to scenarios without program traces at test-time.

Abstract

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of how programs execute at run-time. To address this issue, we propose NExT, a method to teach LLMs to inspect the execution traces of programs (variable states of executed lines) and reason about their run-time behavior through chain-of-thought (CoT) rationales. Specifically, NExT uses self-training to bootstrap a synthetic training set of execution-aware rationales that lead to correct task solutions (e.g., fixed programs) without laborious manual annotation. Experiments on program repair tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Software Engineering Research · Web Application Security Vulnerabilities

MethodsSparse Evolutionary Training · Pathways Language Model