Accelerating Task Generalisation with Multi-Level Skill Hierarchies
Thomas P Cannon, \"Ozg\"ur Simsek

TL;DR
This paper presents FraCOs, a hierarchical reinforcement learning method that improves generalisation to new tasks by identifying behavioral patterns and forming options, achieving state-of-the-art results in complex environments.
Contribution
Introduces FraCOs, a multi-level hierarchical RL approach that enhances task generalisation through pattern-based options, outperforming existing algorithms.
Findings
FraCOs outperforms state-of-the-art algorithms in complex environments.
Effective transfer and improved performance with increased hierarchical depth.
Achieves higher in-distribution and out-of-distribution performance.
Abstract
Creating reinforcement learning agents that generalise effectively to new tasks is a key challenge in AI research. This paper introduces Fracture Cluster Options (FraCOs), a multi-level hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks. FraCOs identifies patterns in agent behaviour and forms options based on the expected future usefulness of those patterns, enabling rapid adaptation to new tasks. In tabular settings, FraCOs demonstrates effective transfer and improves performance as it grows in hierarchical depth. We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments. Our results show that FraCOs achieves higher in-distribution and out-of-distribution performance than competitors.
Peer Reviews
Decision·ICLR 2025 Poster
* Paper is generally well-written. * Related work appears exhaustive. * Limitations are adequately addressed. * Results are generally compelling. * Findings (with potential caveats; see below) are likely to be useful to the broader field.
- The only domain where appropriate baselines are compared against is ProcGen. The results are still compelling, but another domain where baselines are also evaluated would have been useful to benchmark FraCO's relative utility. Additionally, some didactic experiments in the tabular settings with more appropriate baselines would be useful. - Despite discussing more advanced baselines in the related work (most notably HOC, which can support multiple levels of hierarchy), OC is the only one used.
1. Novel approach for discovering re-usable options to accelerate generalization abilities 2. Thorough empirical evaluation of the FraCO method across several benchmarks 3. Clear & honest evaluation of how well the FraCO method may work going forward to additional environments/settings
1. The paper is written in a manner to accelerate the generalization abilities of RL agents. However, there isn't a single mention of sample complexity in the paper. All of the plots and figures are concerned with the overall success rate of the agent. 2. The given results across the 3 benchmarks are difficult to understand given that they are all success rate plots. In reinforcement learning, it is much better practice to use IQM plots [1] in order to compare the performance of multiple algori
The problem of learning options (or other forms of sub-tasks) is an important issue in hierarchical RL. The paper is well-written and generally easy to follow. The motivation for the method provided in Section 4 is strong. The experimental results effectively demonstrate the benefit of FraCOs in terms of generalizing between tasks and accelerating learning in new tasks in discrete state spaces.
There is a lack of discussion of continuous state spaces and no experiment(s) involving them. It seems that a lot of data is required to learn the FraCOs before learning the actual policy, e.g., allowed to discover FraCOs in 50 of the 60 tasks used in Experiment 1. It would be nice to see a comparison of how the amount of FraCO pre-training affects the learning in later tasks and how that relates to, e.g., OC-PPO.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Reinforcement Learning in Robotics
