A primal-dual perspective for distributed TD-learning

Han-Dong Lim; Donghwan Lee

arXiv:2310.00638·cs.LG·May 14, 2025

A primal-dual perspective for distributed TD-learning

Han-Dong Lim, Donghwan Lee

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a novel distributed TD-learning algorithm based on primal-dual optimization that converges efficiently without requiring doubly stochastic communication matrices, applicable to various network scenarios.

Contribution

It develops a primal-dual ODE-based approach for distributed TD-learning that relaxes common network assumptions and analyzes convergence under different step-size and observation models.

Findings

01

Algorithm converges exponentially under various conditions.

02

Does not require doubly stochastic communication matrices.

03

Applicable to both i.i.d. and Markovian data models.

Abstract

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The paper is well organized and key results are explained well. - The authors provide a nice review of recent works on the exponential stability of primal-dual ODE dynamics when the constraint matrix is rank-deficient. - The authors study the exponential stability of a primal-dual ODE dynamics, which has improved the dependence on problem parameter. - The authors propose a new distributed TD learning algorithms, and characterize the solution error rates in both iid and Markovian sampling

Weaknesses

- The exponential stability of primal-dual ODE dynamics is known in the literature when the constraint matrix is rank-deficient. The improvement is only some constant for a special case of objective and constraint functions, which might be not very important to the TD analysis. - The proposed distributed TD learning is based on a known distributed primal-dual ODE dynamics. The error rate analyses follow the Lyapunov-based analysis from the previous work. The technique novelty is questionable.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The distributed TD learning is an important question and this paper has provided new insights. The results seem to be correct.

Weaknesses

1. The paper only considers the average reward scenario. However, there can be another reward scenario (cooperative or competitive), can the result be extended to those setups? 2. There is already quite a bit of work on the multi-agent RL framework for the average-reward case. Please see [A1]. The authors should discuss both in terms of methodology and the results whether they are related or different. The above paper provides the sample complexity bound, and even consider general function appr

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

• The strengths come from three parts. 1) The first one is the proof is very sound. Based on that, in this paper, the convergence results are better than existing papers. 2) The second part is that the literature reviews seem to be very detailed, carefully performed, and up to date. 3) The author used a lot of citations throughout the whole paper.

Weaknesses

• However, there are several weaknesses as far as from my perspective. • The first one is I think the paper is not organized very well: the author mentioned several literature many times throughout the whole paper, which feels very tedious; when reading section 3, I was confused since I don’t know the reason of introducing and proving those lemmas until I read section 4, also the notations in section 3 do not closely correspond to the notations used in the rest of papers. • The second one is I

Code & Models

Repositories

limaries30/distributed-td-learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Neural Networks Stability and Synchronization · Complex Network Analysis Techniques