Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang

TL;DR
This paper introduces LS-Imagine, a method that extends the imagination horizon of world models in visual reinforcement learning, enabling better exploration and long-term decision-making in open worlds.
Contribution
We propose a long short-term world model that simulates goal-conditioned jumpy transitions, improving exploration in high-dimensional open-world environments.
Findings
Outperforms state-of-the-art methods in MineDojo
Enhances exploration efficiency for long-horizon tasks
Integrates long-term values into behavior learning
Abstract
Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sighted", as they are typically trained on short snippets of imagined experiences. We argue that the primary challenge in open-world decision-making is improving the exploration efficiency across a vast state space, especially for tasks that demand consideration of long-horizon payoffs. In this paper, we present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps, enabling the agent to explore behaviors that potentially lead to promising long-term feedback. The foundation of our approach is to build a . To achieve this, we simulate goal-conditioned jumpy state…
Peer Reviews
Decision·ICLR 2025 Oral
1. The jumpy prediction technique within the long-term imagination framework is innovative as it departs from the fixed interval approach prevalent in previous work, offering increased flexibility in jumpy prediction 2. The paper is well-organized and clearly written.
1. The proposed method employs a hierarchical structure, yet the baseline comparisons are made with flat learning methods. Including comparisons with hierarchical MBRL methods like Director[1] could greatly strengthen the paper. 2. Equation 9 appears to have an inconsistency in the time indexing; should the bootstrapping term $R^\lambda_{t+1}$ be $R^\lambda_{t+\hat{\Delta}_{t+1}+1}$ ? 3. The use of $\lambda$ -return in evaluating the policy might introduce bias since it should be evaluated wit
- The paper is mostly well written. - The method proposed is novel and the results are promising comparing to the baselines.
- Although the high-level idea is straight-forward, the implementation is overcomplicated. - The method feels very ad-hoc to the Minecraft tasks studied in this paper. It doesn't come into my mind about any other relevant tasks other than Minecraft where the proposed method can be applied.
- **Significance** - Long-horizon world modeling and reinforcement learning in open-world environments are important problems. - The proposed approach is insightful and successfully addresses these problems. - **Originality** - The proposed approach involves the combination of multiple novel and inisightful components. - **Quality** - Overall the quality of the paper is relatively high, with the method reasonably clearly explained and analyzed.
- **Clarity** - Some aspects of the paper are not particularly clear. The main one is the use of the word 'jumpy' throughout the paper. The meaning of this word is assumed, but is not defined in the paper or standard usage as far as I'm aware, and is relatively unscientific, so I feel it is not the right word to use. 'Multi-step' state transitions seems more appropriate. If the authors were attempting to highlight that the number of steps can vary, then 'variable-step' transitions would be bet
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
