Parallel and Flexible Dynamic Programming via the Randomized Mini-Batch   Operator

Matilde Gargiani; Andrea Martinelli; Max Ruts Martinez; John Lygeros

arXiv:2110.02901·math.OC·October 7, 2021·1 cites

Parallel and Flexible Dynamic Programming via the Randomized Mini-Batch Operator

Matilde Gargiani, Andrea Martinelli, Max Ruts Martinez, John Lygeros

PDF

Open Access

TL;DR

This paper introduces a randomized mini-batch operator for dynamic programming that balances convergence speed and parallelization, enabling more flexible and efficient solutions for Markov decision processes.

Contribution

A novel randomized mini-batch operator for DP that combines the convergence benefits of Gauss-Seidel with the parallelism of Bellman, validated through theoretical analysis and extensive experiments.

Findings

01

The new operator converges faster than Bellman-based methods.

02

It offers better parallelization than Gauss-Seidel-based methods.

03

Performance adapts well to different MDP structures and hardware setups.

Abstract

The Bellman operator constitutes the foundation of dynamic programming (DP). An alternative is presented by the Gauss-Seidel operator, whose evaluation, differently from that of the Bellman operator where the states are all processed at once, updates one state at a time, while incorporating into the computation the interim results. The provably better convergence rate of DP methods based on the Gauss-Seidel operator comes at the price of an inherent sequentiality, which prevents the exploitation of modern multi-core systems. In this work we propose a new operator for dynamic programming, namely, the randomized mini-batch operator, which aims at realizing the trade-off between the better convergence rate of the methods based on the Gauss-Seidel operator and the parallelization capability offered by the Bellman operator. After the introduction of the new operator, a theoretical analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and ELM