Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Yucheng Shi; Zhenwen Liang; Kishan Panaganti; Dian Yu; Wenhao Yu; Haitao Mi

arXiv:2605.14392·cs.AI·May 15, 2026

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Yucheng Shi, Zhenwen Liang, Kishan Panaganti, Dian Yu, Wenhao Yu, Haitao Mi

PDF

TL;DR

This paper introduces a method for self-improving language models that construct and validate environments to facilitate ongoing learning, emphasizing environment stability and difficulty calibration.

Contribution

It presents EvoEnv, a novel environment synthesis approach that enables models to generate and validate environments for self-improvement in reasoning tasks.

Findings

01

EvoEnv improves reasoning accuracy from 72.4% to 74.8%.

02

Environment synthesis with validation enhances model performance.

03

Stable environment difficulty is key to sustained self-improvement.

Abstract

We pursue a vision for self-improving language models in which the model does not merely generate problems or traces to imitate, but constructs the environments that train it. In zero-data reasoning RL, this reframes self-improvement from a data-generation loop into an environment-construction loop, where each artifact is a reusable executable object that samples instances, computes references, and scores responses. Whether this vision sustains improvement hinges on a single property: the environments must exhibit stable solve--verify asymmetry, the model must be able to write an oracle once that it cannot reliably execute in natural language on fresh instances. This asymmetry takes two complementary forms. Some tasks are algorithmically hard to reason through but trivial as code: a dynamic program or graph traversal, compiled once, yields unboundedly many calibrated instances. Others…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.