Flattening Hierarchies with Policy Bootstrapping
John L. Zhou, Jonathan C. Kao

TL;DR
This paper introduces a flat goal-conditioned policy training method using policy bootstrapping and advantage-weighted importance sampling, effectively scaling offline GCRL to complex, long-horizon tasks without hierarchical structures.
Contribution
The authors propose a novel flat policy training algorithm that eliminates the need for hierarchical modularity, enabling scalable offline GCRL in high-dimensional, long-horizon environments.
Findings
Matches or surpasses state-of-the-art offline GCRL methods.
Successfully scales to complex, long-horizon tasks.
Eliminates reliance on generative models for subgoal spaces.
Abstract
Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Multi-Agent Systems and Negotiation
