ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

Xirui Li; Ming Li; Ion Stoica; Cho-Jui Hsieh; Tianyi Zhou

arXiv:2604.18543·cs.AI·April 30, 2026

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

Xirui Li, Ming Li, Ion Stoica, Cho-Jui Hsieh, Tianyi Zhou

PDF

1 Repo 2 Datasets

TL;DR

ClawEnvKit is an automated pipeline that generates diverse, verified environments from natural language descriptions, enabling scalable evaluation and training of claw-like agents.

Contribution

It introduces ClawEnvKit, the first system for automatic environment generation from natural language, and constructs Auto-ClawEval, a large-scale benchmark for claw-like agents.

Findings

01

Auto-ClawEval matches or exceeds human-curated environments in coherence and clarity.

02

Harness engineering improves agent performance by up to 15.7 percentage points.

03

Automated generation enables evaluation at a scale previously infeasible.

Abstract

Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an autonomous generation pipeline that instantiates this formalism from natural language descriptions. The pipeline comprises three modules: (1) a parser that extracts structured generation parameters from natural language input; (2) a generator that produces the task specification, tool interface, and scoring configuration; and (3) a validator that enforces feasibility, diversity, structural validity, and internal consistency across the generated environments. Using ClawEnvKit, we construct Auto-ClawEval, the first large-scale benchmark for claw-like agents, comprising 1,040…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xirui-li/ClawEnvKit
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.