Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing
Hangyu Mao, Zhibo Gong, and Zhen Xiao

TL;DR
This paper investigates reward design in cooperative multi-agent reinforcement learning for packet routing, proposing mixed and adaptive reward signals to improve policy learning and convergence.
Contribution
It introduces mixed and adaptive reward signals tailored for cooperative MARL in packet routing, addressing shortcomings of traditional reward schemes.
Findings
Adaptive reward signals outperform fixed schemes in experiments
Mixed rewards lead to more stable and efficient policies
Reward design significantly impacts MARL performance in routing tasks
Abstract
In cooperative multi-agent reinforcement learning (MARL), how to design a suitable reward signal to accelerate learning and stabilize convergence is a critical problem. The global reward signal assigns the same global reward to all agents without distinguishing their contributions, while the local reward signal provides different local rewards to each agent based solely on individual behavior. Both of the two reward assignment approaches have some shortcomings: the former might encourage lazy agents, while the latter might produce selfish agents. In this paper, we study reward design problem in cooperative MARL based on packet routing environments. Firstly, we show that the above two reward signals are prone to produce suboptimal policies. Then, inspired by some observations and considerations, we design some mixed reward signals, which are off-the-shelf to learn better policies.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Reinforcement Learning in Robotics · Gene Regulatory Network Analysis
