SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph

Jiazheng Li; Yawei Wang; David Yan; Yijun Tian; Zhichao Xu; Huan Song; Panpan Xu; Lin Lee Cheong

arXiv:2510.20022·cs.LG·October 24, 2025

SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph

Jiazheng Li, Yawei Wang, David Yan, Yijun Tian, Zhichao Xu, Huan Song, Panpan Xu, Lin Lee Cheong

PDF

Open Access 1 Video

TL;DR

SALT is a lightweight, graph-based framework that assigns step-level advantages from outcome rewards to improve long-horizon RL tasks, enhancing performance without modifying existing algorithms.

Contribution

SALT introduces a novel graph-based advantage assignment method that can be integrated into group RL algorithms to better handle multi-step, long-horizon tasks.

Findings

01

Consistently improves performance across multiple benchmarks.

02

Seamlessly integrates with existing RL algorithms with minimal overhead.

03

Validates design choices through thorough analysis.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities, enabling language agents to excel at single-turn tasks. However, their application to complex, multi-step, and long-horizon tasks remains challenging. While reinforcement learning (RL) offers a promising avenue for addressing these challenges, mainstream approaches typically rely solely on sparse, outcome-based rewards, a limitation that becomes especially problematic for group-based RL algorithms lacking critic models, such as Group Relative Policy Optimization (GRPO). In such methods, uniformly rewarding or penalizing all actions within a trajectory can lead to training instability and suboptimal policies, because beneficial and detrimental actions are often entangled across multi-step interactions. To address this challenge, we propose SALT, a novel and lightweight framework that provides a finer-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics