Executing as You Generate: Hiding Execution Latency in LLM Code Generation

Zhensu Sun; Zhihao Lin; Zhi Chen; Chengran Yang; Mingyi Zhou; Li Li; David Lo

arXiv:2604.00491·cs.PL·April 2, 2026

Executing as You Generate: Hiding Execution Latency in LLM Code Generation

Zhensu Sun, Zhihao Lin, Zhi Chen, Chengran Yang, Mingyi Zhou, Li Li, David Lo

PDF

TL;DR

This paper introduces Eager, a method for executing code during generation to significantly reduce latency in LLM-based coding agents by parallelizing generation, detection, and execution stages.

Contribution

It formalizes a parallel execution paradigm for LLM code generation, derives latency bounds, and presents Eager, an implementation that achieves substantial latency reductions.

Findings

01

Eager reduces non-overlapped execution latency by up to 99.9%.

02

Eager cuts end-to-end latency by up to 55%.

03

Effective across multiple LLMs and benchmarks.

Abstract

Current LLM-based coding agents follow a serial execution paradigm: the model first generates the complete code, then invokes an interpreter to execute it. This sequential workflow leaves the executor idle during generation and the generator idle during execution, resulting in unnecessary end-to-end latency. We observe that, unlike human developers, LLMs produce code tokens sequentially without revision, making it possible to execute code as it is being generated. We formalize this parallel execution paradigm, modeling it as a three-stage pipeline of generation, detection, and execution, and derive closed-form latency bounds that characterize its speedup potential and operating regimes. We then present Eager, a concrete implementation featuring AST-based chunking, dynamic batching with gated execution, and early error interruption. We evaluate Eager across four benchmarks, seven LLMs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.