Unsynchronized Decentralized Q-Learning: Two Timescale Analysis By Persistence
Bora Yongacoglu, G\"urdal Arslan, Serdar Y\"uksel

TL;DR
This paper analyzes an unsynchronized decentralized Q-learning algorithm in multi-agent reinforcement learning, demonstrating it can converge to equilibrium without the need for synchronized policy updates, thus broadening its practical applicability.
Contribution
It introduces a high-probability convergence analysis for an unsynchronized variant of decentralized Q-learning using constant learning rates, relaxing previous synchronization assumptions.
Findings
Convergence to equilibrium under high probability
Constant learning rates are critical for analysis
Applicable to a range of decentralized algorithms
Abstract
Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn. Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways, including synchronizing times at which agents are allowed to revise their policies. Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchronization is infeasible in many decentralized applications. In this paper, we study an unsynchronized variant of the decentralized Q-learning algorithm, a recent MARL algorithm for stochastic games. We provide sufficient conditions under which the unsynchronized algorithm drives play to equilibrium with high probability. Our solution utilizes constant learning rates in the Q-factor update, which we show to be critical for relaxing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Age of Information Optimization · Reinforcement Learning in Robotics
MethodsQ-Learning
