Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Tian Tian, Kenny Young, Richard S. Sutton

TL;DR
This paper introduces doubly-asynchronous value iteration (DAVI), an algorithm that efficiently approximates optimal policies in large state and action spaces by sampling actions, maintaining convergence guarantees and practical effectiveness.
Contribution
DAVI extends asynchronous value iteration by sampling actions, reducing computation while preserving convergence properties and near-optimality in large-scale problems.
Findings
DAVI converges to the optimal value function with probability one.
DAVI achieves near-geometric convergence rate with high probability.
Empirical results demonstrate DAVI's effectiveness in large state-action spaces.
Abstract
Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification
