Large Language Models as Code Executors: An Exploratory Study

Chenyang Lyu; Lecheng Yan; Rui Xing; Wenxi Li; Younes Samih; Tianbo; Ji; Longyue Wang

arXiv:2410.06667·cs.CL·October 11, 2024·2 cites

Large Language Models as Code Executors: An Exploratory Study

Chenyang Lyu, Lecheng Yan, Rui Xing, Wenxi Li, Younes Samih, Tianbo, Ji, Longyue Wang

PDF

Open Access

TL;DR

This study explores the novel use of Large Language Models as code executors, evaluating their accuracy across various models and introducing an iterative prompting technique to improve performance.

Contribution

It is the first comprehensive examination of LLMs as code executors and proposes an iterative prompting method to enhance their accuracy.

Findings

01

o1 model achieved over 90% accuracy in code execution

02

Iterative Instruction Prompting improved accuracy by up to 19.46%

03

Different models showed varying levels of effectiveness in code execution

Abstract

The capabilities of Large Language Models (LLMs) have significantly evolved, extending from natural language processing to complex tasks like code understanding and generation. We expand the scope of LLMs' capabilities to a broader context, using LLMs to execute code snippets to obtain the output. This paper pioneers the exploration of LLMs as code executors, where code snippets are directly fed to the models for execution, and outputs are returned. We are the first to comprehensively examine this feasibility across various LLMs, including OpenAI's o1, GPT-4o, GPT-3.5, DeepSeek, and Qwen-Coder. Notably, the o1 model achieved over 90% accuracy in code execution, while others demonstrated lower accuracy levels. Furthermore, we introduce an Iterative Instruction Prompting (IIP) technique that processes code snippets line by line, enhancing the accuracy of weaker models by an average of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Residual Connection · Weight Decay · Attention Is All You Need · Cosine Annealing · Dropout · Byte Pair Encoding · Softmax