Learning Multiple Coordinated Agents under Directed Acyclic Graph Constraints
Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li,, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang

TL;DR
This paper introduces a new multi-agent reinforcement learning approach that leverages directed acyclic graph structures to improve coordination and learning efficiency, validated through real-world and benchmark environments.
Contribution
It presents a novel MARL method exploiting DAG constraints, with a new surrogate value function and a training algorithm involving leader and reward generator agents.
Findings
Outperforms non-DAG MARL methods in four environments
Proves the surrogate value function as a lower bound of the optimal
Demonstrates effectiveness in real-world scheduling tasks
Abstract
This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints. Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance. Theoretically, we propose a novel surrogate value function based on a MARL model with synthetic rewards (MARLM-SR) and prove that it serves as a lower bound of the optimal value function. Computationally, we propose a practical training algorithm that exploits new notion of leader agent and reward generator and distributor agent to guide the decomposed follower agents to better explore the parameter space in environments with DAG constraints. Empirically, we exploit four DAG environments including a real-world scheduling for one of Intel's high volume packaging and test factory to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Scheduling and Optimization Algorithms
