OpenClaw-RL: Train Any Agent Simply by Talking

Yinjie Wang; Xuyang Chen; Xiaolong Jin; Mengdi Wang; Ling Yang

arXiv:2603.10165·cs.CL·May 12, 2026

OpenClaw-RL: Train Any Agent Simply by Talking

Yinjie Wang, Xuyang Chen, Xiaolong Jin, Mengdi Wang, Ling Yang

PDF

1 Repo 1 Models

TL;DR

OpenClaw-RL introduces a framework that leverages next-state signals for online reinforcement learning, enabling agents to improve through natural interactions across diverse environments.

Contribution

It unifies evaluative and directive signals in a hybrid RL objective and extends RL infrastructure to real-world, multi-environment agent settings.

Findings

01

Enables agents to improve via user interactions like re-queries and corrections.

02

First RL framework to unify real-world agent environments including terminal, GUI, and tool-call.

03

Demonstrates utility of next-state signals in long-horizon tasks.

Abstract

Every agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a live, online learning source. We present OpenClaw-RL, a framework that employs next-state signals to optimize personal agents online through infrastructure and methodology innovations. On the infrastructure side, we extend existing RL systems to a server-client architecture where the RL server hosts the policy behind an inference API and user terminals stream interaction data back over HTTP. From each observed next state, the system extracts two complementary training signals, evaluative and directive, via a separate asynchronous server so that neither signal extraction nor optimization blocks inference. On the methodology side, we introduce a hybrid RL objective that unifies both signal types…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gen-verse/OpenClaw-RL
github

Models

🤗
Rabornkraken/qwen3.5-27b-agent-grpo-v2
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.