Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
Baiting Luo, Ava Pettet, Aron Laszka, Abhishek Dubey, Ayan, Mukhopadhyay

TL;DR
This paper introduces L-MAP, a method that learns macro-actions via a VQ-VAE to enable scalable, low-latency decision-making in high-dimensional, stochastic environments for offline reinforcement learning.
Contribution
L-MAP is the first approach to learn macro-actions through a state-conditional VQ-VAE for scalable decision-making in stochastic, high-dimensional offline RL tasks.
Findings
L-MAP achieves low decision latency despite high action dimensionality.
L-MAP outperforms existing model-based methods in stochastic control tasks.
L-MAP performs comparably to strong model-free baselines.
Abstract
Sequential decision-making in high-dimensional continuous action spaces, particularly in stochastic environments, faces significant computational challenges. We explore this challenge in the traditional offline RL setting, where an agent must learn how to make decisions based on data collected through a stochastic behavior policy. We present Latent Macro Action Planner (L-MAP), which addresses this challenge by learning a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE), effectively reducing action dimensionality. L-MAP employs a (separate) learned prior model that acts as a latent transition model and allows efficient sampling of plausible actions. During planning, our approach accounts for stochasticity in both the environment and the behavior policy by using Monte Carlo tree search (MCTS). In offline RL settings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Data Management and Algorithms · Bayesian Modeling and Causal Inference
MethodsSparse Evolutionary Training
