A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning
Ege C. Kaya, Abolfazl Hashemi

TL;DR
This paper develops a finite-iteration theoretical framework for asynchronous categorical distributional temporal-difference learning, bridging the gap between theory and practical implementations.
Contribution
It introduces a finite-iteration analysis for asynchronous categorical TD methods, applicable to both scalar and multivariate settings, under various sampling regimes.
Findings
Provides finite-iteration guarantees for asynchronous categorical TD algorithms.
Establishes contraction properties in a statewise supremum norm after isometric embeddings.
Applicable to both discounted and undiscounted fixed-horizon problems under different sampling assumptions.
Abstract
Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory. This leaves an important gap between existing finite-iteration theory and the categorical recursions most closely aligned with practical distributional temporal-difference implementations. We bridge this gap for two categorical policy-evaluation methods: scalar categorical temporal-difference learning in the Cram\'er geometry and multivariate signed-categorical temporal-difference learning in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
