Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai; Tingkai Yang; Kele Xu; Feng Dawei; Cheng Yang; Bo Ding,; Huaimin Wang

arXiv:2409.09345·cs.AI·September 17, 2024

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Yuanzhao Zhai, Tingkai Yang, Kele Xu, Feng Dawei, Cheng Yang, Bo Ding,, Huaimin Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a step-level Q-value modeling approach for LLM agents, significantly improving decision-making performance by guiding action selection with learned value estimates, applicable across various agents and tasks.

Contribution

The paper proposes a novel method to estimate step-level Q-values for LLM agents using Monte Carlo Tree Search and direct policy optimization, enhancing multi-step decision-making.

Findings

01

Q-value models improved agent performance by over 100% on WebShop.

02

Q-value models increased HotPotQA performance by 75%.

03

Method generalizes across different LLM agents and tasks.

Abstract

Agents significantly enhance the capabilities of standalone Large Language Models (LLMs) by perceiving environments, making decisions, and executing actions. However, LLM agents still face challenges in tasks that require multiple decision-making steps. Estimating the value of actions in specific tasks is difficult when intermediate actions are neither appropriately rewarded nor penalized. In this paper, we propose leveraging a task-relevant Q-value model to guide action selection. Specifically, we first collect decision-making trajectories annotated with step-level Q values via Monte Carlo Tree Search (MCTS) and construct preference data. We then use another LLM to fit these preferences through step-level Direct Policy Optimization (DPO), which serves as the Q-value model. During inference, at each decision-making step, LLM agents select the action with the highest Q value before…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services