Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing
Daniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler

TL;DR
This paper proposes an application-aware routing algorithm for Dragonfly networks that reduces network noise and improves performance by adaptively selecting minimal paths based on application characteristics.
Contribution
It introduces a novel routing algorithm that dynamically adjusts path selection probabilities to mitigate network noise in HPC systems.
Findings
Noise can be effectively estimated and reduced using the proposed method.
Application-aware routing improves performance on real-world HPC workloads.
The approach reduces congestion and enhances overall system efficiency.
Abstract
System noise can negatively impact the performance of HPC systems, and the interconnection network is one of the main factors contributing to this problem. To mitigate this effect, adaptive routing sends packets on non-minimal paths if they are less congested. However, while this may mitigate interference caused by congestion, it also generates more traffic since packets traverse additional hops, causing in turn congestion on other applications and on the application itself. In this paper, we first describe how to estimate network noise. By following these guidelines, we show how noise can be reduced by using routing algorithms which select minimal paths with a higher probability. We exploit this knowledge to design an algorithm which changes the probability of selecting minimal paths according to the application characteristics. We validate our solution on microbenchmarks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
