SetupBench: Assessing Software Engineering Agents' Ability to Bootstrap Development Environments
Avi Arora, Jinu Jang, Roshanak Zilouchian Moghaddam

TL;DR
SetupBench is a comprehensive benchmark designed to evaluate LLM agents' ability to bootstrap software development environments from scratch, revealing significant gaps in current agent capabilities and exploration strategies.
Contribution
We introduce SetupBench, a novel benchmark for assessing environment-bootstrap skills of software agents, covering diverse ecosystems and identifying key failure modes.
Findings
Low success rates in environment setup tasks
Systematic failure modes like incomplete tooling installation
High inefficiency in agent exploration strategies
Abstract
Modern Large Language Model (LLM) agents promise end to end assistance with real-world software tasks, yet existing benchmarks evaluate LLM agents almost exclusively in pre-baked environments where every dependency is pre-installed. To fill this gap, we introduce SetupBench, a 93 instance benchmark that isolates the environment-bootstrap skill: starting from a bare Linux sandbox, an agent must install packages, resolve dependency conflicts, initialize databases, and configure background services. Our tasks span seven language ecosystems, five database engines, and multi-service orchestration scenarios, each accompanies by a natural language problem statement and a deterministic success command. Through evaluation of OpenHands, a state-of-the-art coding agent, we find low success rates across task categories, with particular challenges in repository setup (38.9-57.4%) and local database…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Model-Driven Software Engineering Techniques · Service-Oriented Architecture and Web Services
