Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems
Xi Shi, Mengxin Zheng, Qian Lou

TL;DR
This paper introduces LAMaS, a learning-based framework that optimizes latency in parallel multi-agent systems, significantly reducing critical path length while maintaining or improving task performance.
Contribution
It presents a novel latency-aware orchestration framework for parallel multi-agent systems that explicitly optimizes execution latency and constructs efficient execution topologies.
Findings
Reduces critical path length by 38-46% compared to baselines.
Maintains or improves task performance with latency optimization.
Demonstrates effectiveness across multiple benchmarks.
Abstract
Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Constraint Satisfaction and Optimization · Graph Theory and Algorithms
