STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order

Chengyang Gu; Yuxin Pan; Hui Xiong; and Yize Chen

arXiv:2601.08107·cs.LG·January 14, 2026

STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order

Chengyang Gu, Yuxin Pan, Hui Xiong, and Yize Chen

PDF

Open Access

TL;DR

STO-RL introduces a novel offline reinforcement learning framework that uses large language models to generate temporally ordered subgoals and applies reward shaping, significantly improving performance on sparse-reward long-horizon tasks.

Contribution

The paper presents a new offline RL approach leveraging LLMs for subgoal ordering and potential-based reward shaping, addressing limitations of existing methods in sparse-reward scenarios.

Findings

01

Outperforms state-of-the-art offline RL baselines on multiple benchmarks.

02

Achieves faster convergence and higher success rates.

03

Demonstrates robustness to noisy LLM-generated subgoals.

Abstract

Offline reinforcement learning (RL) enables policy learning from pre-collected datasets, avoiding costly and risky online interactions, but it often struggles with long-horizon tasks involving sparse rewards. Existing goal-conditioned and hierarchical offline RL methods decompose such tasks and generate intermediate rewards to mitigate limitations of traditional offline RL, but usually overlook temporal dependencies among subgoals and rely on imprecise reward shaping, leading to suboptimal policies. To address these issues, we propose STO-RL (Offline RL using LLM-Guided Subgoal Temporal Order), an offline RL framework that leverages large language models (LLMs) to generate temporally ordered subgoal sequences and corresponding state-to-subgoal-stage mappings. Using this temporal structure, STO-RL applies potential-based reward shaping to transform sparse terminal rewards into dense,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Topic Modeling