Learning Multi-Timescale Abstractions for Hierarchical Combinatorial Planning
Vivienne Huiling Wang, Tinghuai Wang, Joni Pajarinen

TL;DR
This paper presents a hierarchical, model-based reinforcement learning framework that uses multi-timescale abstractions and latent-space planning to efficiently solve complex sequential stochastic combinatorial optimization problems.
Contribution
It introduces a novel multi-timescale latent-space planning approach combined with a subgoal-conditioned policy for resource-aware decision-making in SSCO.
Findings
Outperforms strong baselines on challenging SSCO benchmarks.
Effectively models variable-duration decisions with a latent-space tree-search.
Enables efficient lookahead through multi-timescale latent dynamics.
Abstract
The combination of exponentially large action spaces, stochastic dynamics, and long-horizon decision-making under limited resources makes Sequential Stochastic Combinatorial Optimization (SSCO) particularly challenging for reinforcement learning. Hierarchical Reinforcement Learning (HRL) offers a natural decomposition, but it places the high-level policy in a Semi-Markov Decision Process (SMDP) where actions have variable durations, making it difficult to learn a world model that is suitable for planning. We introduce a model-based hierarchical framework for sequential stochastic combinatorial decision-making that directly addresses this issue. Our method combines a latent-space tree-search planner with an SMDP-aware world model for variable-duration decisions. A multi-timescale objective structures the latent dynamics so that transition magnitudes reflect the effective temporal scales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
