Training Agents with Weakly Supervised Feedback from Large Language Models
Dihong Gong, Pu Lu, Zelong Wang, Meng Zhou, Xiuqiang He

TL;DR
This paper presents a new training approach for LLM-based agents that uses weak supervision from a critic LLM, enabling iterative improvement without relying on expert trajectories or definitive feedback, and achieves competitive performance with fewer parameters.
Contribution
Introduces a novel weakly supervised training method for LLM agents using a critic LLM, bypassing traditional reliance on expert data or explicit environmental feedback.
Findings
Agents show consistent improvement over iterations.
Performance comparable to GPT-4 with fewer parameters.
Effective on the API-bank dataset.
Abstract
Large Language Models (LLMs) offer a promising basis for creating agents that can tackle complex tasks through iterative environmental interaction. Existing methods either require these agents to mimic expert-provided trajectories or rely on definitive environmental feedback for reinforcement learning which limits their application to specific scenarios like gaming or code generation. This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM, bypassing the need for expert trajectories or definitive feedback. Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction. Subsequently, a critic LLM selects a subset of good trajectories, which are then used to update the agents, enabling them to generate improved trajectories in the next iteration. Extensive tests on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsByte Pair Encoding · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam · Attention Is All You Need · Softmax · Label Smoothing · Dropout · Linear Layer
