StackPilot: Autonomous Function Agents for Scalable and Environment-Free Code Execution
Xinkui Zhao, Yifan Zhang, Zhengyi Zhou, Yueshen Xu

TL;DR
StackPilot is a novel multi-agent framework that enables language-agnostic, environment-free code verification and execution, significantly improving reliability over traditional environment-dependent methods.
Contribution
It introduces a Function-as-Agents paradigm, an LLM-as-Executor strategy, and a snapshot mechanism for deterministic context switching, advancing automated code verification.
Findings
Achieves 89%-97% reliability in code verification.
Outperforms baseline approaches in diverse programming tasks.
Operates independently of conventional toolchains.
Abstract
Recent advances in large language models (LLMs) have substantially enhanced automated code generation across a wide range of programming languages. Nonetheless, verifying the correctness and executability of LLM-generated code remains a significant challenge, as traditional methods rely on language-specific compilers and environment-dependent runtimes. To overcome these limitations, we introduce StackPilot, an LLM-native, multi-agent framework designed for language-agnostic code verification and execution, which operates independently of conventional toolchains. StackPilot offers three principal innovations: (1) a Function-as-Agents paradigm, in which each function is modeled as an autonomous agent capable of fine-grained reasoning and collaborative verification; (2) an LLM-as-Executor strategy, which enables scalable verification via stack-based scheduling; and (3) a novel snapshot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Software Testing and Debugging Techniques · Distributed and Parallel Computing Systems
