Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation

Leo Muxing Wang; Pengkun Yang; Lili Su

arXiv:2603.02426·cs.LG·March 10, 2026

Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation

Leo Muxing Wang, Pengkun Yang, Lili Su

PDF

Open Access

TL;DR

This paper introduces a novel approach for personalized multi-agent average reward TD learning that leverages shared linear structures to improve convergence and mitigate conflicting signals, inspired by federated learning techniques.

Contribution

It proposes a cooperative single-timescale TD learning method that estimates a common subspace among agents, addressing heterogeneity and Markovian sampling challenges.

Findings

01

Achieves linear speedup in convergence.

02

Effectively filters out conflicting signals.

03

Demonstrates benefits through experiments.

Abstract

We study personalized multi-agent average reward TD learning, in which a collection of agents interacts with different environments and jointly learns their respective value functions. We focus on the setting where there exists a shared linear representation, and the agents' optimal weights collectively lie in an unknown linear subspace. Inspired by the recent success of personalized federated learning (PFL), we study the convergence of cooperative single-timescale TD learning in which agents iteratively estimate the common subspace and local heads. We showed that this decomposition can filter out conflicting signals, effectively mitigating the negative impacts of ``misaligned'' signals, and achieving linear speedup. The main technical challenges lie in the heterogeneity, the Markovian sampling, and their intricate interplay in shaping error evolutions. Specifically, not only are the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis