A Multi-Agent, Policy-Gradient approach to Network Routing

Nigel Tao; Jonathan Baxter; Lex Weaver

arXiv:2512.03211·cs.LG·December 4, 2025·56 cites

A Multi-Agent, Policy-Gradient approach to Network Routing

Nigel Tao, Jonathan Baxter, Lex Weaver

PDF

Open Access

TL;DR

This paper introduces a multi-agent policy-gradient reinforcement learning method for network routing, enabling distributed routers to learn cooperative strategies without communication, significantly improving convergence and overall performance.

Contribution

It presents a novel multi-agent reinforcement learning approach for network routing that enhances cooperation and convergence without explicit inter-agent communication.

Findings

01

Agents learned cooperative routing behavior

02

Reward shaping improved convergence rate

03

Distributed agents avoided detrimental behaviors

Abstract

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Traffic and Congestion Control · Peer-to-Peer Network Technologies