ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
Xirui Li, Ming Li, Ion Stoica, Cho-Jui Hsieh, Tianyi Zhou

TL;DR
ClawEnvKit is an automated pipeline that generates diverse, verified environments from natural language descriptions, enabling scalable evaluation and training of claw-like agents.
Contribution
It introduces ClawEnvKit, the first system for automatic environment generation from natural language, and constructs Auto-ClawEval, a large-scale benchmark for claw-like agents.
Findings
Auto-ClawEval matches or exceeds human-curated environments in coherence and clarity.
Harness engineering improves agent performance by up to 15.7 percentage points.
Automated generation enables evaluation at a scale previously infeasible.
Abstract
Constructing environments for training and evaluating claw-like agents remains a manual, human-intensive process that does not scale. We argue that what is needed is not just a dataset, but an automated pipeline capable of generating diverse, verified environments on demand. To this end, we introduce ClawEnvKit, an autonomous generation pipeline that instantiates this formalism from natural language descriptions. The pipeline comprises three modules: (1) a parser that extracts structured generation parameters from natural language input; (2) a generator that produces the task specification, tool interface, and scoring configuration; and (3) a validator that enforces feasibility, diversity, structural validity, and internal consistency across the generated environments. Using ClawEnvKit, we construct Auto-ClawEval, the first large-scale benchmark for claw-like agents, comprising 1,040…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
