A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
Gugan Thoppe, Bhumesh Kumar

TL;DR
This paper establishes a novel law of iterated logarithm for distributed nonlinear stochastic approximation in multi-agent reinforcement learning, providing almost sure convergence rates under weaker assumptions than previous results.
Contribution
It introduces the first law of iterated logarithm for distributed stochastic approximation, applicable to MARL, with convergence rates independent of the interaction graph.
Findings
Distributed TD(0) with stepsize n^(-γ) converges at rate O(√(n^(-γ) log n)) a.s.
For stepsize 1/n, convergence rate is O(√(n^(-1) log log n)) a.s.
Results hold under weaker assumptions, not requiring doubly stochastic gossip matrices or square summable stepsizes.
Abstract
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Distributed Control Multi-Agent Systems · Advanced Bandit Algorithms Research
