Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning

Yuchen Jiao; Jiin Woo; Gen Li; Gauri Joshi; Yuejie Chi

arXiv:2601.13642·stat.ML·January 21, 2026

Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning

Yuchen Jiao, Jiin Woo, Gen Li, Gauri Joshi, Yuejie Chi

PDF

Open Access

TL;DR

This paper analyzes the sample complexity of Q-learning for average-reward Markov decision processes, providing new bounds and extending the analysis to federated reinforcement learning with multiple agents.

Contribution

It introduces the first federated Q-learning algorithm for average-reward MDPs with provable sample and communication complexity guarantees.

Findings

01

Single-agent Q-learning achieves improved sample complexity bounds.

02

Federated Q-learning reduces per-agent sample complexity with minimal communication rounds.

03

First theoretical analysis of federated reinforcement learning in average-reward settings.

Abstract

Average-reward reinforcement learning offers a principled framework for long-term decision-making by maximizing the mean reward per time step. Although Q-learning is a widely used model-free algorithm with established sample complexity in discounted and finite-horizon Markov decision processes (MDPs), its theoretical guarantees for average-reward settings remain limited. This work studies a simple but effective Q-learning algorithm for average-reward MDPs with finite state and action spaces under the weakly communicating assumption, covering both single-agent and federated scenarios. For the single-agent case, we show that Q-learning with carefully chosen parameters achieves sample complexity $O (\frac{∣ S ∣∣ A ∣∥ h ^{⋆} ∥ _{sp}^{3}}{ε ^{3}})$ , where $∥ h^{⋆} ∥_{sp}$ is the span norm of the bias function, improving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Age of Information Optimization