Characterization of Real Communication Patterns and Congestion Dynamics in HPC Interconnection Networks

Miguel S\'anchez de La Rosa; Gabriel Gomez-Lopez; Alejandro Baviera; Jose Duro; Francisco J. and\'ujar; Jesus Escudero-Sahuquillo; Pedro J. Garcia; Francisco J. Alfaro; Maria E. Gomez; Julio Sahuquillo; Jos\'e L. S\'anchez; Francisco J. Quiles

arXiv:2604.16088·cs.NI·April 20, 2026

Characterization of Real Communication Patterns and Congestion Dynamics in HPC Interconnection Networks

Miguel S\'anchez de La Rosa, Gabriel Gomez-Lopez, Alejandro Baviera, Jose Duro, Francisco J. and\'ujar, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Alfaro, Maria E. Gomez, Julio Sahuquillo, Jos\'e L. S\'anchez, Francisco J. Quiles

PDF

TL;DR

This paper introduces a methodology using the VEF Traces framework to characterize, model, and simulate communication patterns and congestion in HPC interconnection networks, aiding performance analysis.

Contribution

It extends the VEF traces framework with new tools for congestion characterization and applies it to analyze real HPC application traces.

Findings

01

Identified congestion scenarios in HPC network configurations.

02

Extended VEF framework enables detailed congestion analysis.

03

Analyzed traces from multiple supercomputers and applications.

Abstract

The interconnection network is a key component of Supercomputers and Data centers, and its design must cope with the increasing communication demands of current applications and services; otherwise, it may become a system bottleneck. The most challenging network design issues are the topology, routing algorithm, flow control, and power efficiency. However, even the most efficient interconnection networks may suffer severe performance degradation due to congestion, especially under specific network traffic patterns generated by communication operations in high-performance computing~(HPC), deep learning training, or online data-intensive services. In this context, characterizing and modeling these communication operations and the network traffic patterns they generate is a fundamental challenge for studying their impact on network performance. This paper presents a methodology, based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.