A Tool for In-depth Analysis of Code Execution Reasoning of Large Language Models
Changshu Liu, Reyhaneh Jabbarvand

TL;DR
This paper presents ExeRScope, a comprehensive toolset designed to facilitate in-depth analysis of large language models' code execution reasoning, enabling better understanding and generalization across datasets.
Contribution
The paper introduces ExeRScope, a novel set of tools and heuristics for analyzing LLMs' code reasoning, addressing the lack of in-depth analysis tools in this domain.
Findings
Enables analysis of code properties affecting reasoning
Facilitates generalization across datasets
Supports understanding of LLMs' reasoning capabilities
Abstract
Code Executing Reasoning is becoming a new non-functional metric that assesses the ability of large language models (LLMs) in programming tasks. State-of-the-art frameworks (CodeMind or REval) and benchmarks (CruxEval) usually focus on LLM's prediction of a given code's input/output or intermediate variable states/values on limited programs. However, there is no tool for more in-depth analysis of the results. Without such a tool, the observations about LLM's code execution reasoning cannot be generalized to more datasets, preventing the research community and practitioners from devising the next generation of LLMs with better code execution reasoning abilities. This paper introduces ExeRScope, a series of tools and heuristics to analyze the result of code execution reasoning frameworks to understand better the impact of code properties in the studied benchmarks on the code execution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Reliability and Analysis Research
