Training Agents with Weakly Supervised Feedback from Large Language   Models

Dihong Gong; Pu Lu; Zelong Wang; Meng Zhou; Xiuqiang He

arXiv:2411.19547·cs.CL·December 2, 2024

Training Agents with Weakly Supervised Feedback from Large Language Models

Dihong Gong, Pu Lu, Zelong Wang, Meng Zhou, Xiuqiang He

PDF

Open Access

TL;DR

This paper presents a new training approach for LLM-based agents that uses weak supervision from a critic LLM, enabling iterative improvement without relying on expert trajectories or definitive feedback, and achieves competitive performance with fewer parameters.

Contribution

Introduces a novel weakly supervised training method for LLM agents using a critic LLM, bypassing traditional reliance on expert data or explicit environmental feedback.

Findings

01

Agents show consistent improvement over iterations.

02

Performance comparable to GPT-4 with fewer parameters.

03

Effective on the API-bank dataset.

Abstract

Large Language Models (LLMs) offer a promising basis for creating agents that can tackle complex tasks through iterative environmental interaction. Existing methods either require these agents to mimic expert-provided trajectories or rely on definitive environmental feedback for reinforcement learning which limits their application to specific scenarios like gaming or code generation. This paper introduces a novel training method for LLM-based agents using weakly supervised signals from a critic LLM, bypassing the need for expert trajectories or definitive feedback. Our agents are trained in iterative manner, where they initially generate trajectories through environmental interaction. Subsequently, a critic LLM selects a subset of good trajectories, which are then used to update the agents, enabling them to generate improved trajectories in the next iteration. Extensive tests on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsByte Pair Encoding · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam · Attention Is All You Need · Softmax · Label Smoothing · Dropout · Linear Layer