Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang

TL;DR
This paper introduces ComDML, a decentralized workload balancing method for multi-agent learning that reduces training time in heterogeneous environments by optimizing workload offloading based on agents' capacities.
Contribution
The paper proposes a novel decentralized workload balancing approach using local-loss split training and integer programming to optimize offloading in multi-agent learning.
Findings
Significantly reduces training time compared to existing methods.
Maintains model accuracy while balancing workloads.
Proven convergence for convex and non-convex functions.
Abstract
Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottleneck, lengthening the overall training time due to straggler effects and potentially wasting spare resources of faster agents. To minimize training time in heterogeneous environments, we present a Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning (ComDML), which balances the workload among agents through a decentralized approach. Leveraging local-loss split training, ComDML enables parallel updates, where slower agents offload part of their workload to faster agents. To minimize the overall training time, ComDML optimizes the workload balancing by jointly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques · Multi-Agent Systems and Negotiation · Robotics and Automated Systems
