Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation
Leo Muxing Wang, Pengkun Yang, Lili Su

TL;DR
This paper introduces a novel approach for personalized multi-agent average reward TD learning that leverages shared linear structures to improve convergence and mitigate conflicting signals, inspired by federated learning techniques.
Contribution
It proposes a cooperative single-timescale TD learning method that estimates a common subspace among agents, addressing heterogeneity and Markovian sampling challenges.
Findings
Achieves linear speedup in convergence.
Effectively filters out conflicting signals.
Demonstrates benefits through experiments.
Abstract
We study personalized multi-agent average reward TD learning, in which a collection of agents interacts with different environments and jointly learns their respective value functions. We focus on the setting where there exists a shared linear representation, and the agents' optimal weights collectively lie in an unknown linear subspace. Inspired by the recent success of personalized federated learning (PFL), we study the convergence of cooperative single-timescale TD learning in which agents iteratively estimate the common subspace and local heads. We showed that this decomposition can filter out conflicting signals, effectively mitigating the negative impacts of ``misaligned'' signals, and achieving linear speedup. The main technical challenges lie in the heterogeneity, the Markovian sampling, and their intricate interplay in shaping error evolutions. Specifically, not only are the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis
