Learning Multiple Coordinated Agents under Directed Acyclic Graph   Constraints

Jaeyeon Jang; Diego Klabjan; Han Liu; Nital S. Patel; Xiuqi Li,; Balakrishnan Ananthanarayanan; Husam Dauod; Tzung-Han Juang

arXiv:2307.07529·cs.LG·July 18, 2023·1 cites

Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints

Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li,, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

PDF

Open Access

TL;DR

This paper introduces a new multi-agent reinforcement learning approach that leverages directed acyclic graph structures to improve coordination and learning efficiency, validated through real-world and benchmark environments.

Contribution

It presents a novel MARL method exploiting DAG constraints, with a new surrogate value function and a training algorithm involving leader and reward generator agents.

Findings

01

Outperforms non-DAG MARL methods in four environments

02

Proves the surrogate value function as a lower bound of the optimal

03

Demonstrates effectiveness in real-world scheduling tasks

Abstract

This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthetic rewards (MARLM-SR) and prove that it serves as a lower bound of the optimal value function. Computationally, we propose a practical training algorithm that exploits new notion of leader agent and reward generator and distributor agent to guide the decomposed follower agents to better explore the parameter space in environments with DAG constraints. Empirically, we exploit four DAG environments including a real-world scheduling for one of Intel's high volume packaging and test factory to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Scheduling and Optimization Algorithms