You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary   Projects

Islem Bouzenia; Michael Pradel

arXiv:2412.10133·cs.SE·May 1, 2025·2 cites

You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects

Islem Bouzenia, Michael Pradel

PDF

Open Access 1 Repo

TL;DR

This paper introduces ExecutionAgent, an LLM-based system that autonomously prepares and executes test suites across diverse projects, significantly improving test execution success rates and matching ground truth results with high accuracy.

Contribution

The paper presents a novel LLM-driven agent that automates cross-project test execution, handling multiple languages and tools with minimal human intervention.

Findings

01

Successfully executed tests for 33 out of 50 projects.

02

Achieved only 7.5% deviation from ground truth results.

03

Outperformed previous techniques by 6.6x in success rate.

Abstract

The ability to execute the test suite of a project is essential in many scenarios, e.g., to assess code quality and code coverage, to validate code changes made by developers or automated tools, and to ensure compatibility with dependencies. Despite its importance, executing the test suite of a project can be challenging in practice because different projects use different programming languages, software ecosystems, build systems, testing frameworks, and other tools. These challenges make it difficult to create a reliable, universal test execution method that works across different projects. This paper presents ExecutionAgent, an automated technique that prepares scripts for building an arbitrary project from source code and running its test cases. Inspired by the way a human developer would address this task, our approach is a large language model (LLM)-based agent that autonomously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sola-st/executionagent
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Digital and Cyber Forensics · Multi-Agent Systems and Negotiation