Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous   in Actions

Tian Tian; Kenny Young; Richard S. Sutton

arXiv:2207.01613·cs.LG·November 29, 2022·1 cites

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

Tian Tian, Kenny Young, Richard S. Sutton

PDF

Open Access 1 Video

TL;DR

This paper introduces doubly-asynchronous value iteration (DAVI), an algorithm that efficiently approximates optimal policies in large state and action spaces by sampling actions, maintaining convergence guarantees and practical effectiveness.

Contribution

DAVI extends asynchronous value iteration by sampling actions, reducing computation while preserving convergence properties and near-optimality in large-scale problems.

Findings

01

DAVI converges to the optimal value function with probability one.

02

DAVI achieves near-geometric convergence rate with high probability.

03

Empirical results demonstrate DAVI's effectiveness in large state-action spaces.

Abstract

Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification