An Analysis of Action-Value Temporal-Difference Methods That Learn State Values

Brett Daley; Prabhat Nagarajan; Martha White; Marlos C. Machado

arXiv:2507.09523·cs.LG·September 5, 2025

An Analysis of Action-Value Temporal-Difference Methods That Learn State Values

Brett Daley, Prabhat Nagarajan, Martha White, Marlos C. Machado

PDF

Open Access

TL;DR

This paper analyzes the convergence and efficiency of action-value TD methods that learn state values as intermediates, introducing a new algorithm that outperforms existing methods in benchmark tests.

Contribution

It provides a theoretical comparison of QV-learning and AV-learning, and introduces RDQ, a novel AV-learning algorithm with superior performance.

Findings

01

AV-learning offers major benefits over Q-learning in control tasks.

02

Both families outperform Expected Sarsa in prediction tasks.

03

RDQ significantly outperforms Dueling DQN in MinAtar benchmarks.

Abstract

The hallmark feature of temporal-difference (TD) learning is bootstrapping: using value predictions to generate new value predictions. The vast majority of TD methods for control learn a policy by bootstrapping from a single action-value function (e.g., Q-learning and Sarsa). Significantly less attention has been given to methods that bootstrap from two asymmetric value functions: i.e., methods that learn state values as an intermediate step in learning action values. Existing algorithms in this vein can be categorized as either QV-learning or AV-learning. Though these algorithms have been investigated to some degree in prior work, it remains unclear if and when it is advantageous to learn two value functions instead of just one -- and whether such approaches are theoretically sound in general. In this paper, we analyze these algorithmic families in terms of convergence and sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making