Asymmetric Actor-Critic for Multi-turn LLM Agents
Shuli Jiang, Zhaoyang Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano Soatto

TL;DR
This paper introduces an asymmetric actor-critic framework where a large proprietary LLM acts as the actor and a smaller open-source model functions as the critic, improving reliability in multi-turn conversational tasks without retraining the actor.
Contribution
The novel framework enables runtime supervision of a fixed large LLM by a smaller critic, enhancing reliability and success rates in conversational agents without modifying the main model.
Findings
The approach significantly improves task success over strong baselines.
Lightweight open-source critics can match or outperform larger proprietary models.
Critic fine-tuning provides additional performance gains.
Abstract
Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reliable conversational agents. A powerful proprietary LLM acts as the actor, while a smaller open-source critic provides runtime supervision, monitoring the actor's actions and intervening within the same interaction trajectory. Unlike training-based actor-critic methods, our framework supervises a fixed actor operating in open-ended conversational environments. The design leverages a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
