Offline Reinforcement Learning with Universal Horizon Models

Hojun Chung; Junseo Lee; Songhwai Oh

arXiv:2605.15603·cs.LG·May 18, 2026

Offline Reinforcement Learning with Universal Horizon Models

Hojun Chung, Junseo Lee, Songhwai Oh

PDF

1 Repo

TL;DR

This paper introduces universal horizon models (UHM), a novel approach in model-based offline reinforcement learning that predicts future states across arbitrary horizons, improving long-term reasoning and stability.

Contribution

The paper proposes UHMs, a scalable model that predicts over arbitrary horizons and employs a winsorized horizon distribution for stable value learning.

Findings

01

Outperforms baselines on 100 OGBench tasks

02

Excels in long-horizon reasoning tasks

03

Shows robustness on suboptimal datasets

Abstract

Model-based reinforcement learning (RL) offers a compelling approach to offline RL by enabling value learning on imagined on-policy trajectories. However, it often suffers from compounding errors due to repeated model inference on self-generated states. While geometric horizon models (GHM) alleviate this issue through direct prediction over a discounted infinite-horizon future, they remain challenged in accurately modeling distant future states. To this end, we introduce universal horizon models (UHM), a generalization of GHM that directly predicts future states under arbitrary horizons. Leveraging this flexibility, we propose a scalable value learning method that employs a winsorized horizon distribution to stabilize training by capping excessively large horizons. Experimental results on 100 challenging OGBench tasks demonstrate that the proposed method outperforms competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://rllab-snu.github.io/projects/UHM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.