Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Joey Hong, Kang Liu, Zhan Ling, Jiecao Chen, Sergey Levine

TL;DR
This paper introduces Natural Language Actor-Critic (NLAC), a novel off-policy training method for LLM agents that uses a generative LLM critic to improve learning stability and efficiency in long-horizon, sparse reward tasks.
Contribution
NLAC is the first actor-critic algorithm for LLMs that employs a generative language critic, enabling off-policy training and richer feedback for better exploration and stability.
Findings
NLAC outperforms existing methods in reasoning, web browsing, and tool-use tasks.
It provides more stable and data-efficient training for LLM agents.
Generative language critics enhance exploration by offering natural language explanations.
Abstract
Large language model (LLM) agents -- LLMs that dynamically interact with an environment over long horizons -- have become an increasingly important area of research, enabling automation in complex tasks involving tool-use, web browsing, and dialogue with people. In the absence of expert demonstrations, training LLM agents has relied on policy gradient methods that optimize LLM policies with respect to an (often sparse) reward function. However, in long-horizon tasks with sparse rewards, learning from trajectory-level rewards can be noisy, leading to training that is unstable and has high sample complexity. Furthermore, policy improvement hinges on discovering better actions through exploration, which can be difficult when actions lie in natural language space. In this paper, we propose Natural Language Actor-Critic (NLAC), a novel actor-critic algorithm that trains LLM policies using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Artificial Intelligence in Healthcare and Education
