Hierarchical Behaviour Spaces

Michael Tryfan Matthews; Anssi Kanervisto; Jakob Foerster; Pierluca D'Oro; Scott Fujimoto; Mikael Henaff

arXiv:2604.24558·cs.AI·April 28, 2026

Hierarchical Behaviour Spaces

Michael Tryfan Matthews, Anssi Kanervisto, Jakob Foerster, Pierluca D'Oro, Scott Fujimoto, Mikael Henaff

PDF

TL;DR

Hierarchical Behaviour Spaces (HBS) leverage linear combinations of reward functions to create a rich behaviour space, enhancing exploration and performance in complex environments like NetHack.

Contribution

HBS introduces a novel way to use reward functions for inducing a diverse behaviour space, improving exploration over traditional hierarchical reinforcement learning methods.

Findings

01

HBS achieves strong performance on NetHack.

02

Benefits of hierarchy stem from increased exploration, not just long-term reasoning.

03

Linear reward combinations enable more expressive policies.

Abstract

Recent work in hierarchical reinforcement learning has shown success in scaling to billions of timesteps when learning over a set of predefined option reward functions. We show that, instead of using a single reward function per option, the reward functions can be effectively used to induce a space of behaviours, by letting the controller specify linear combinations over reward functions, allowing a more expressive set of policies to be represented. We call this method Hierarchical Behaviour Spaces (HBS). We evaluate HBS on the NetHack Learning Environment, demonstrating strong performance. We conduct a series of experiments and determine that, perhaps going against conventional wisdom, the benefits of hierarchy in our method come from increased exploration rather than long term reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.