# A Study of Network Congestion in Two Supercomputing High-Speed   Interconnects

**Authors:** Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Mike Showerman,, Eric Roman, Zbigniew T. Kalbarczyk, William T. Kramer, Ravishankar K. Iyer

arXiv: 1907.05312 · 2019-07-12

## TL;DR

This paper presents an empirical analysis of network congestion in petascale supercomputers, comparing two interconnect technologies, and introduces a framework for long-term congestion monitoring to inform better congestion control strategies.

## Contribution

It provides the first field-based congestion characterization for two major high-speed interconnects using a new monitoring framework.

## Key findings

- Congestion patterns differ significantly between Cray Gemini and Cray Aries.
- The monitoring framework effectively captures long-term congestion trends.
- Results highlight the need for tailored congestion control approaches for different topologies.

## Abstract

Network congestion in high-speed interconnects is a major source of application run time performance variation. Recent years have witnessed a surge of interest from both academia and industry in the development of novel approaches for congestion control at the network level and in application placement, mapping, and scheduling at the system-level. However, these studies are based on proxy applications and benchmarks that are not representative of field-congestion characteristics of high-speed interconnects. To address this gap, we present (a) an end-to-end framework for monitoring and analysis to support long-term field-congestion characterization studies, and (b) an empirical study of network congestion in petascale systems across two different interconnect technologies: (i) Cray Gemini, which uses a 3-D torus topology, and (ii) Cray Aries, which uses the DragonFly topology.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.05312/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1907.05312/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/1907.05312/full.md

---
Source: https://tomesphere.com/paper/1907.05312