TL;DR
OpenClaw-RL introduces a framework that leverages next-state signals for online reinforcement learning, enabling agents to improve through natural interactions across diverse environments.
Contribution
It unifies evaluative and directive signals in a hybrid RL objective and extends RL infrastructure to real-world, multi-environment agent settings.
Findings
Enables agents to improve via user interactions like re-queries and corrections.
First RL framework to unify real-world agent environments including terminal, GUI, and tool-call.
Demonstrates utility of next-state signals in long-horizon tasks.
Abstract
Every agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a live, online learning source. We present OpenClaw-RL, a framework that employs next-state signals to optimize personal agents online through infrastructure and methodology innovations. On the infrastructure side, we extend existing RL systems to a server-client architecture where the RL server hosts the policy behind an inference API and user terminals stream interaction data back over HTTP. From each observed next state, the system extracts two complementary training signals, evaluative and directive, via a separate asynchronous server so that neither signal extraction nor optimization blocks inference. On the methodology side, we introduce a hybrid RL objective that unifies both signal types…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
