RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang; Jingfeng Yang; Sreyashi Nag; Samarth Varshney; Xianfeng Tang; Haoming Jiang; Jingbo Shang; Sheikh Muhammad Sarwar

arXiv:2505.20737·cs.AI·May 28, 2025

RRO: LLM Agent Optimization Through Rising Reward Trajectories

Zilong Wang, Jingfeng Yang, Sreyashi Nag, Samarth Varshney, Xianfeng Tang, Haoming Jiang, Jingbo Shang, Sheikh Muhammad Sarwar

PDF

Open Access

TL;DR

This paper introduces Reward Rising Optimization (RRO), a novel method for improving large language model agents by encouraging increasing reward trajectories, which enhances performance with less computational cost.

Contribution

The paper proposes RRO, a new approach that leverages rising reward trends in trajectories for efficient process supervision in LLM agents, reducing exploration costs.

Findings

01

RRO outperforms existing methods on WebShop and InterCode-SQL benchmarks.

02

RRO achieves higher accuracy with less exploration.

03

The approach is mathematically grounded and empirically validated.

Abstract

Large language models (LLMs) have exhibited extraordinary performance in a variety of tasks while it remains challenging for them to solve complex multi-step tasks as agents. In practice, agents sensitive to the outcome of certain key steps which makes them likely to fail the task because of a subtle mistake in the planning trajectory. Recent approaches resort to calibrating the reasoning process through reinforcement learning. They reward or penalize every reasoning step with process supervision, as known as Process Reward Models (PRMs). However, PRMs are difficult and costly to scale up with a large number of next action candidates since they require extensive computations to acquire the training data through the per-step trajectory exploration. To mitigate this issue, we focus on the relative reward trend across successive reasoning steps and propose maintaining an increasing reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Simulation Techniques and Applications