Code Simulation Challenges for Large Language Models
Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin,, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

TL;DR
This paper evaluates the ability of Large Language Models to simulate coding and algorithmic tasks, introduces benchmarks, and proposes a novel prompting method called Chain of Simulation (CoSm) to improve their reasoning capabilities.
Contribution
It introduces new benchmarks for code simulation and a novel prompting technique, CoSm, to enhance LLMs' algorithmic reasoning and simulation performance.
Findings
LLMs' simulation ability is affected by algorithmic complexity.
Powerful LLMs show relatively strong but fragile simulation capabilities.
CoSm improves simulation performance by reducing reliance on memorization.
Abstract
Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
