Asymmetric Actor-Critic for Multi-turn LLM Agents

Shuli Jiang; Zhaoyang Zhang; Yi Zhang; Shuo Yang; Wei Xia; Stefano Soatto

arXiv:2604.00304·cs.CL·April 2, 2026

Asymmetric Actor-Critic for Multi-turn LLM Agents

Shuli Jiang, Zhaoyang Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano Soatto

PDF

TL;DR

This paper introduces an asymmetric actor-critic framework where a large proprietary LLM acts as the actor and a smaller open-source model functions as the critic, improving reliability in multi-turn conversational tasks without retraining the actor.

Contribution

The novel framework enables runtime supervision of a fixed large LLM by a smaller critic, enhancing reliability and success rates in conversational agents without modifying the main model.

Findings

01

The approach significantly improves task success over strong baselines.

02

Lightweight open-source critics can match or outperform larger proprietary models.

03

Critic fine-tuning provides additional performance gains.

Abstract

Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based actor-critic methods, our framework supervises a fixed actor operating in open-ended conversational environments. The design leverages a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.