Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning
Yuchen Jiao, Jiin Woo, Gen Li, Gauri Joshi, Yuejie Chi

TL;DR
This paper analyzes the sample complexity of Q-learning for average-reward Markov decision processes, providing new bounds and extending the analysis to federated reinforcement learning with multiple agents.
Contribution
It introduces the first federated Q-learning algorithm for average-reward MDPs with provable sample and communication complexity guarantees.
Findings
Single-agent Q-learning achieves improved sample complexity bounds.
Federated Q-learning reduces per-agent sample complexity with minimal communication rounds.
First theoretical analysis of federated reinforcement learning in average-reward settings.
Abstract
Average-reward reinforcement learning offers a principled framework for long-term decision-making by maximizing the mean reward per time step. Although Q-learning is a widely used model-free algorithm with established sample complexity in discounted and finite-horizon Markov decision processes (MDPs), its theoretical guarantees for average-reward settings remain limited. This work studies a simple but effective Q-learning algorithm for average-reward MDPs with finite state and action spaces under the weakly communicating assumption, covering both single-agent and federated scenarios. For the single-agent case, we show that Q-learning with carefully chosen parameters achieves sample complexity , where is the span norm of the bias function, improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Age of Information Optimization
