Double Horizon Model-Based Policy Optimization

Akihiro Kubo; Paavo Parmas; Shin Ishii

arXiv:2512.15439·cs.LG·December 18, 2025

Double Horizon Model-Based Policy Optimization

Akihiro Kubo, Paavo Parmas, Shin Ishii

PDF

Open Access

TL;DR

The paper introduces DHMBPO, a novel model-based reinforcement learning method that employs two different rollout horizons to better balance bias, variance, and distribution shift, leading to improved efficiency and stability.

Contribution

It proposes a double-horizon approach dividing rollouts into distribution and training phases, addressing conflicting optimal horizons in model-based RL.

Findings

01

Outperforms existing MBRL methods on continuous-control benchmarks.

02

Achieves higher sample efficiency and lower runtime.

03

Effectively balances distribution shift, model bias, and gradient variance.

Abstract

Model-based reinforcement learning (MBRL) reduces the cost of real-environment sampling by generating synthetic trajectories (called rollouts) from a learned dynamics model. However, choosing the length of the rollouts poses two dilemmas: (1) Longer rollouts better preserve on-policy training but amplify model bias, indicating the need for an intermediate horizon to mitigate distribution shift (i.e., the gap between on-policy and past off-policy samples). (2) Moreover, a longer model rollout may reduce value estimation bias but raise the variance of policy gradients due to backpropagation through multiple steps, implying another intermediate horizon for stable gradient estimates. However, these two optimal horizons may differ. To resolve this conflict, we propose Double Horizon Model-Based Policy Optimization (DHMBPO), which divides the rollout procedure into a long "distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks