Code Simulation Challenges for Large Language Models

Emanuele La Malfa; Christoph Weinhuber; Orazio Torre; Fangru Lin,; Samuele Marro; Anthony Cohn; Nigel Shadbolt; Michael Wooldridge

arXiv:2401.09074·cs.LG·June 13, 2024·1 cites

Code Simulation Challenges for Large Language Models

Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin,, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the ability of Large Language Models to simulate coding and algorithmic tasks, introduces benchmarks, and proposes a novel prompting method called Chain of Simulation (CoSm) to improve their reasoning capabilities.

Contribution

It introduces new benchmarks for code simulation and a novel prompting technique, CoSm, to enhance LLMs' algorithmic reasoning and simulation performance.

Findings

01

LLMs' simulation ability is affected by algorithmic complexity.

02

Powerful LLMs show relatively strong but fragile simulation capabilities.

03

CoSm improves simulation performance by reducing reliance on memorization.

Abstract

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emanuelelm/codesimulation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis