MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Chuanzhe Guo; Jingjing Wu; Sijun He; Yang Chen; Zhaoqi Kuang; Shilong Fan; Bingjin Chen; Siqi Bao; Jing Liu; Hua Wu; Qingfu Zhu; Wanxiang Che; Haifeng Wang

arXiv:2601.22859·cs.SE·February 3, 2026

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Chuanzhe Guo, Jingjing Wu, Sijun He, Yang Chen, Zhaoqi Kuang, Shilong Fan, Bingjin Chen, Siqi Bao, Jing Liu, Hua Wu, Qingfu Zhu, Wanxiang Che, Haifeng Wang

PDF

Open Access 4 Datasets

TL;DR

MEnvAgent is a scalable multi-language framework that automates the construction of verifiable software environments, significantly improving success rates and efficiency for large language model-based software engineering tasks.

Contribution

We introduce MEnvAgent, a novel multi-agent framework with environment reuse mechanisms for scalable, verifiable environment construction across multiple programming languages.

Findings

01

Outperforms baselines with 8.6% higher Fail-to-Pass rate.

02

Reduces environment construction time by 43%.

03

Creates the largest open-source polyglot verifiable environment dataset.

Abstract

The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation of verifiable task instances. MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures and integrates a novel Environment Reuse Mechanism that reduces computational overhead by incrementally patching historical environments. Evaluations on MEnvBench, a new benchmark comprising 1,000 tasks across 10 languages, demonstrate that MEnvAgent outperforms baselines, improving Fail-to-Pass (F2P) rates by 8.6% while reducing time costs by 43%.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Software System Performance and Reliability · Software Engineering Research