ClawGym: A Scalable Framework for Building Effective Claw Agents

Fei Bai; Huatong Song; Shuang Sun; Daixuan Cheng; Yike Yang; Chuan Hao; Renyuan Li; Feng Chang; Yuan Wei; Ran Tao; Bryan Dai; Jian Yang; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2604.26904·cs.CL·May 19, 2026

ClawGym: A Scalable Framework for Building Effective Claw Agents

Fei Bai, Huatong Song, Shuang Sun, Daixuan Cheng, Yike Yang, Chuan Hao, Renyuan Li, Feng Chang, Yuan Wei, Ran Tao, Bryan Dai, Jian Yang, Wayne Xin Zhao, Ji-Rong Wen

PDF

1 Repo

TL;DR

ClawGym introduces a comprehensive, scalable framework for developing, training, and evaluating Claw-style personal agents using synthesized datasets, hybrid verification, and benchmark resources.

Contribution

It provides a systematic platform for synthesizing training data, training models, and benchmarking Claw agents, addressing scalability and verifiability challenges.

Findings

01

Constructed a dataset of 13.5K tasks from persona-driven intents.

02

Trained ClawGym-Agents via supervised fine-tuning and reinforcement learning.

03

Developed ClawGym-Bench with 200 benchmark instances for evaluation.

Abstract

Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ClawGym
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.