STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order
Chengyang Gu, Yuxin Pan, Hui Xiong, and Yize Chen

TL;DR
STO-RL introduces a novel offline reinforcement learning framework that uses large language models to generate temporally ordered subgoals and applies reward shaping, significantly improving performance on sparse-reward long-horizon tasks.
Contribution
The paper presents a new offline RL approach leveraging LLMs for subgoal ordering and potential-based reward shaping, addressing limitations of existing methods in sparse-reward scenarios.
Findings
Outperforms state-of-the-art offline RL baselines on multiple benchmarks.
Achieves faster convergence and higher success rates.
Demonstrates robustness to noisy LLM-generated subgoals.
Abstract
Offline reinforcement learning (RL) enables policy learning from pre-collected datasets, avoiding costly and risky online interactions, but it often struggles with long-horizon tasks involving sparse rewards. Existing goal-conditioned and hierarchical offline RL methods decompose such tasks and generate intermediate rewards to mitigate limitations of traditional offline RL, but usually overlook temporal dependencies among subgoals and rely on imprecise reward shaping, leading to suboptimal policies. To address these issues, we propose STO-RL (Offline RL using LLM-Guided Subgoal Temporal Order), an offline RL framework that leverages large language models (LLMs) to generate temporally ordered subgoal sequences and corresponding state-to-subgoal-stage mappings. Using this temporal structure, STO-RL applies potential-based reward shaping to transform sparse terminal rewards into dense,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Topic Modeling
