Modeling and Analysis of Application Interference on Dragonfly+
Yao Kang, Xin Wang, Neil McGlohon, Misbah Mubarak, Sudheer Chunduri,, Zhiling Lan

TL;DR
This paper evaluates how application communication interference affects performance on Dragonfly+ supercomputer networks using simulation, revealing that intra-job interference can severely degrade performance, while certain communication patterns are more resilient.
Contribution
It provides a quantitative analysis of application interference on Dragonfly+ networks using the CODES toolkit, highlighting the impact of different job placement policies and communication patterns.
Findings
Intra-job interference causes significant performance degradation.
Job isolation reduces inter-job interference for certain communication patterns.
Applications with one-to-all communication are resilient to network interference.
Abstract
Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system design. In this study, we quantitatively evaluate a variety of application communication interactions on a 3,456-node Dragonfly+ system by using the CODES toolkit. This study looks at the impact of communication interference from a user's perspective. Specifically, for a given application submitted by a user, we examine how this application will behave with the existing workload running in the system under different job placement policies. Our simulation study considers hundreds of experiment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Simulation and Modeling Applications · Video Analysis and Summarization
