A Tool for In-depth Analysis of Code Execution Reasoning of Large   Language Models

Changshu Liu; Reyhaneh Jabbarvand

arXiv:2501.18482·cs.SE·January 31, 2025

A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models

Changshu Liu, Reyhaneh Jabbarvand

PDF

Open Access

TL;DR

This paper presents ExeRScope, a comprehensive toolset designed to facilitate in-depth analysis of large language models' code execution reasoning, enabling better understanding and generalization across datasets.

Contribution

The paper introduces ExeRScope, a novel set of tools and heuristics for analyzing LLMs' code reasoning, addressing the lack of in-depth analysis tools in this domain.

Findings

01

Enables analysis of code properties affecting reasoning

02

Facilitates generalization across datasets

03

Supports understanding of LLMs' reasoning capabilities

Abstract

Code Executing Reasoning is becoming a new non-functional metric that assesses the ability of large language models (LLMs) in programming tasks. State-of-the-art frameworks (CodeMind or REval) and benchmarks (CruxEval) usually focus on LLM's prediction of a given code's input/output or intermediate variable states/values on limited programs. However, there is no tool for more in-depth analysis of the results. Without such a tool, the observations about LLM's code execution reasoning cannot be generalized to more datasets, preventing the research community and practitioners from devising the next generation of LLMs with better code execution reasoning abilities. This paper introduces ExeRScope, a series of tools and heuristics to analyze the result of code execution reasoning frameworks to understand better the impact of code properties in the studied benchmarks on the code execution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research