You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects
Islem Bouzenia, Michael Pradel

TL;DR
This paper introduces ExecutionAgent, an LLM-based system that autonomously prepares and executes test suites across diverse projects, significantly improving test execution success rates and matching ground truth results with high accuracy.
Contribution
The paper presents a novel LLM-driven agent that automates cross-project test execution, handling multiple languages and tools with minimal human intervention.
Findings
Successfully executed tests for 33 out of 50 projects.
Achieved only 7.5% deviation from ground truth results.
Outperformed previous techniques by 6.6x in success rate.
Abstract
The ability to execute the test suite of a project is essential in many scenarios, e.g., to assess code quality and code coverage, to validate code changes made by developers or automated tools, and to ensure compatibility with dependencies. Despite its importance, executing the test suite of a project can be challenging in practice because different projects use different programming languages, software ecosystems, build systems, testing frameworks, and other tools. These challenges make it difficult to create a reliable, universal test execution method that works across different projects. This paper presents ExecutionAgent, an automated technique that prepares scripts for building an arbitrary project from source code and running its test cases. Inspired by the way a human developer would address this task, our approach is a large language model (LLM)-based agent that autonomously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Digital and Cyber Forensics · Multi-Agent Systems and Negotiation
