Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

Joey Hong; Kang Liu; Zhan Ling; Jiecao Chen; Sergey Levine

arXiv:2512.04601·cs.LG·February 4, 2026

Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space

Joey Hong, Kang Liu, Zhan Ling, Jiecao Chen, Sergey Levine

PDF

Open Access

TL;DR

This paper introduces Natural Language Actor-Critic (NLAC), a novel off-policy training method for LLM agents that uses a generative LLM critic to improve learning stability and efficiency in long-horizon, sparse reward tasks.

Contribution

NLAC is the first actor-critic algorithm for LLMs that employs a generative language critic, enabling off-policy training and richer feedback for better exploration and stability.

Findings

01

NLAC outperforms existing methods in reasoning, web browsing, and tool-use tasks.

02

It provides more stable and data-efficient training for LLM agents.

03

Generative language critics enhance exploration by offering natural language explanations.

Abstract

Large language model (LLM) agents -- LLMs that dynamically interact with an environment over long horizons -- have become an increasingly important area of research, enabling automation in complex tasks involving tool-use, web browsing, and dialogue with people. In the absence of expert demonstrations, training LLM agents has relied on policy gradient methods that optimize LLM policies with respect to an (often sparse) reward function. However, in long-horizon tasks with sparse rewards, learning from trajectory-level rewards can be noisy, leading to training that is unstable and has high sample complexity. Furthermore, policy improvement hinges on discovering better actions through exploration, which can be difficult when actions lie in natural language space. In this paper, we propose Natural Language Actor-Critic (NLAC), a novel actor-critic algorithm that trains LLM policies using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Artificial Intelligence in Healthcare and Education