Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic

Leo Muxing Wang; Pengkun Yang; Lili Su

arXiv:2605.14423·cs.LG·May 15, 2026

Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic

Leo Muxing Wang, Pengkun Yang, Lili Su

PDF

Abstract

Despite the popularity of the actor-critic method and the practical needs of collaborative policy training, existing works typically either overlook environmental heterogeneity or give up personalization altogether by training a single shared policy across all agents. We consider a federated actor-critic framework in which agents share a common linear subspace representation while maintaining personalized local policy components, and agents iteratively estimate the common subspace, local critic heads, and local policies (i.e., actors). Under canonical single-timescale updates with Markovian sampling, we establish finite-time convergence via a novel joint linear approximation framework. Specifically, we show that the critic error converges to zero at the rate of $\tilde{O} (1/ ((1 - γ)^{4} T K))$ , and the policy gradient norm converges to zero at the rate of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.