Scalable Decision-Making in Stochastic Environments through Learned   Temporal Abstraction

Baiting Luo; Ava Pettet; Aron Laszka; Abhishek Dubey; Ayan; Mukhopadhyay

arXiv:2502.21186·cs.LG·March 4, 2025

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Baiting Luo, Ava Pettet, Aron Laszka, Abhishek Dubey, Ayan, Mukhopadhyay

PDF

Open Access 1 Repo

TL;DR

This paper introduces L-MAP, a method that learns macro-actions via a VQ-VAE to enable scalable, low-latency decision-making in high-dimensional, stochastic environments for offline reinforcement learning.

Contribution

L-MAP is the first approach to learn macro-actions through a state-conditional VQ-VAE for scalable decision-making in stochastic, high-dimensional offline RL tasks.

Findings

01

L-MAP achieves low decision latency despite high action dimensionality.

02

L-MAP outperforms existing model-based methods in stochastic control tasks.

03

L-MAP performs comparably to strong model-free baselines.

Abstract

Sequential decision-making in high-dimensional continuous action spaces, particularly in stochastic environments, faces significant computational challenges. We explore this challenge in the traditional offline RL setting, where an agent must learn how to make decisions based on data collected through a stochastic behavior policy. We present Latent Macro Action Planner (L-MAP), which addresses this challenge by learning a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE), effectively reducing action dimensionality. L-MAP employs a (separate) learned prior model that acts as a latent transition model and allows efficient sampling of plausible actions. During planning, our approach accounts for stochasticity in both the environment and the behavior policy by using Monte Carlo tree search (MCTS). In offline RL settings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baitingluo/l-map
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Data Management and Algorithms · Bayesian Modeling and Causal Inference

MethodsSparse Evolutionary Training