Adapting LLM Agents with Universal Feedback in Communication
Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang,, Yelong Shen

TL;DR
This paper introduces Learning through Communication (LTC), a universal feedback-based training method for LLM agents that improves their adaptability and performance across diverse tasks and environments.
Contribution
The paper proposes LTC, a universal feedback buffer and iterative pipeline, enabling LLM agents to learn from linguistic and non-linguistic signals in various multi-agent and single-agent settings.
Findings
LTC outperforms supervised fine-tuning by 3.6% to 12% on multiple datasets.
LTC effectively integrates diverse feedback types for improved agent learning.
The approach demonstrates versatility across different task environments.
Abstract
Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an given environment. To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments. We evaluate the efficacy of our LTC approach on four diverse datasets: ALFWorld (single-agent), HotpotQA (multi-agent collaboration), Chameleon (multi-agent competition), and GSM8k (multi-agent teacher-student). On these data sets, LTC outperforms the supervised instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies · Business Process Modeling and Analysis
MethodsEntropy Regularization · Proximal Policy Optimization
