MXDAG: A Hybrid Abstraction for Cluster Applications
Weitao Wang, Sushovan Das, Xinyu Crystal Wu, Zhuang Wang, Ang Chen, T., S. Eugene Ng

TL;DR
MXDAG introduces a hybrid abstraction that explicitly models both compute and network tasks in distributed applications, enabling better co-scheduling and improved end-to-end performance.
Contribution
It presents MXDAG, a novel abstraction that captures dependencies of compute and network tasks for more effective co-scheduling in distributed applications.
Findings
Enhanced application performance through co-scheduling.
Better modeling of task dependencies.
Potential for more efficient distributed system management.
Abstract
Distributed applications, such as database queries and distributed training, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow abstraction collectively schedules network flows among compute tasks but lacks the end-to-end view of the application DAG. Because of the dependencies and interactions between these two types of tasks, it is sub-optimal to only consider one of them. We argue that co-scheduling of both compute and network tasks can help applications towards the globally optimal end-to-end performance. However, none of the existing abstractions can provide fine-grained information for co-scheduling. We propose MXDAG, an abstraction to treat both compute and network tasks explicitly. It can capture the dependencies and interactions of both compute and network tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Distributed systems and fault tolerance
