Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li; Qi Wang; Yunbo Wang; Xin Jin; Yang Li; Wenjun Zeng; Xiaokang Yang

arXiv:2410.03618·cs.LG·March 10, 2026

Open-World Reinforcement Learning over Long Short-Term Imagination

Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces LS-Imagine, a method that extends the imagination horizon of world models in visual reinforcement learning, enabling better exploration and long-term decision-making in open worlds.

Contribution

We propose a long short-term world model that simulates goal-conditioned jumpy transitions, improving exploration in high-dimensional open-world environments.

Findings

01

Outperforms state-of-the-art methods in MineDojo

02

Enhances exploration efficiency for long-horizon tasks

03

Integrates long-term values into behavior learning

Abstract

Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a $long short-term world model$ . To achieve this, we simulate goal-conditioned jumpy state…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 8Confidence 3

Strengths

1. The jumpy prediction technique within the long-term imagination framework is innovative as it departs from the fixed interval approach prevalent in previous work, offering increased flexibility in jumpy prediction 2. The paper is well-organized and clearly written.

Weaknesses

1. The proposed method employs a hierarchical structure, yet the baseline comparisons are made with flat learning methods. Including comparisons with hierarchical MBRL methods like Director[1] could greatly strengthen the paper. 2. Equation 9 appears to have an inconsistency in the time indexing; should the bootstrapping term $R^\lambda_{t+1}$ be $R^\lambda_{t+\hat{\Delta}_{t+1}+1}$ ? 3. The use of $\lambda$ -return in evaluating the policy might introduce bias since it should be evaluated wit

Reviewer 02Rating 8Confidence 4

Strengths

- The paper is mostly well written. - The method proposed is novel and the results are promising comparing to the baselines.

Weaknesses

- Although the high-level idea is straight-forward, the implementation is overcomplicated. - The method feels very ad-hoc to the Minecraft tasks studied in this paper. It doesn't come into my mind about any other relevant tasks other than Minecraft where the proposed method can be applied.

Reviewer 03Rating 8Confidence 4

Strengths

- **Significance** - Long-horizon world modeling and reinforcement learning in open-world environments are important problems. - The proposed approach is insightful and successfully addresses these problems. - **Originality** - The proposed approach involves the combination of multiple novel and inisightful components. - **Quality** - Overall the quality of the paper is relatively high, with the method reasonably clearly explained and analyzed.

Weaknesses

- **Clarity** - Some aspects of the paper are not particularly clear. The main one is the use of the word 'jumpy' throughout the paper. The meaning of this word is assumed, but is not defined in the paper or standard usage as far as I'm aware, and is relatively unscientific, so I feel it is not the right word to use. 'Multi-step' state transitions seems more appropriate. If the authors were attempting to highlight that the number of steps can vary, then 'variable-step' transitions would be bet

Code & Models

Repositories

qiwang067/LS-Imagine
pytorchOfficial

Videos

Open-World Reinforcement Learning over Long Short-Term Imagination· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics