OptionZero: Planning with Learned Options
Po-Wei Huang, Pei-Chiun Peng, Hung Guei, Ti-Rong Wu

TL;DR
OptionZero enhances planning in reinforcement learning by autonomously discovering options through self-play, leading to significant performance improvements over MuZero in Atari games.
Contribution
It introduces an option network into MuZero for autonomous option discovery and modifies the dynamics network to enable deeper search with options.
Findings
Outperforms MuZero with 131.58% score improvement in Atari games.
Learns strategic options tailored to different game characteristics.
Demonstrates effective autonomous discovery of options through self-play.
Abstract
Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert demonstration data. Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named OptionZero. OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games. Furthermore, we modify the dynamics network to provide environment transitions when using options, allowing searching deeper under the same simulation constraints. Empirical experiments conducted in 26 Atari games demonstrate that OptionZero outperforms MuZero, achieving a 131.58% improvement in mean human-normalized score. Our behavior analysis shows that OptionZero not only learns…
Peer Reviews
Decision·ICLR 2025 Oral
The need for efficiency in decision making in RL is clear, as single-step actions are slow and computationally expensive (even more so in slow simulators). Thus, the problem addressed by OptionZero is clear and its existence is well-motivated. Additionally, since much prior work in options appear to be in manually defined and demonstration-based settings, the generalisability of OptionZero is a strong selling point. Within fixed computational constraints, the idea of decreasing the frequency of
It is unclear why options are outperformed by primitive actions in certain environments. The authors suggest that in environments with high combinatorial complexity, learning of the dynamics model may be difficult and thus options may simply produce more overhead than actual benefit. A more detailed analysis of these environments would be beneficial, for e.g. investigate whether there is a correlation between the stochastic branching factor of the environment and the performance of options. Add
The paper is well-written. The authors explain the use of options clearly with a toy example, demonstrating how options are used. The empirical results are also strong, achieving high mean normalized scores.
It's not clear the actual benefits options bring. In the intro, the paper claims options allow for "searching deeper", but the empirical analysis shows "deeper search is likely but not necessary for improving performance". While it's nice to have the option to do option, could the authors provide a more detailed analysis of options beyond a deeper search? The paper could also benefit more from discussions of 1) the trade-offs between increased complexity and performance gains 2) how much tuni
- Novel idea for autonomous and adaptable option discovery. The proposed method's ability to autonomously discover and tailor options to diverse game dynamics removes the need for predefined actions, making it highly adaptable across different environments. - Convincing results for enhanced planning in RL. By integrating an option network, OptionZero reduces decision frequency, enabling computational efficiency, particularly in visually complex tasks like Atari games. - Strong Performance Gai
- Inconsistent Option Use Across Games: OptionZero's reliance on options appears to vary widely across Atari games. While longer options bring substantial gains in some games, they contribute less in others. This inconsistency suggests that the model’s option-based planning may struggle to generalize well across diverse, complex environments. The paper should discuss this limitation. - Challenges in Complex Action Spaces: In games with intricate action spaces, such as Bank Heist (Atari), Optio
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · AI-based Problem Solving and Planning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · Prioritized Experience Replay · Residual Connection · Residual Block · Monte-Carlo Tree Search · Convolution · MuZero
