Characterization of Real Communication Patterns and Congestion Dynamics in HPC Interconnection Networks
Miguel S\'anchez de La Rosa, Gabriel Gomez-Lopez, Alejandro Baviera, Jose Duro, Francisco J. and\'ujar, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Alfaro, Maria E. Gomez, Julio Sahuquillo, Jos\'e L. S\'anchez, Francisco J. Quiles

TL;DR
This paper introduces a methodology using the VEF Traces framework to characterize, model, and simulate communication patterns and congestion in HPC interconnection networks, aiding performance analysis.
Contribution
It extends the VEF traces framework with new tools for congestion characterization and applies it to analyze real HPC application traces.
Findings
Identified congestion scenarios in HPC network configurations.
Extended VEF framework enables detailed congestion analysis.
Analyzed traces from multiple supercomputers and applications.
Abstract
The interconnection network is a key component of Supercomputers and Data centers, and its design must cope with the increasing communication demands of current applications and services; otherwise, it may become a system bottleneck. The most challenging network design issues are the topology, routing algorithm, flow control, and power efficiency. However, even the most efficient interconnection networks may suffer severe performance degradation due to congestion, especially under specific network traffic patterns generated by communication operations in high-performance computing~(HPC), deep learning training, or online data-intensive services. In this context, characterizing and modeling these communication operations and the network traffic patterns they generate is a fundamental challenge for studying their impact on network performance. This paper presents a methodology, based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
