ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication
Emir Gencer (1), Mohammad Kefah Taha Issa (1), Ilyas Turimbetov (1), James D. Trotter (2), Didem Unat (1) ((1) Ko\c{c} University, Turkey, (2) Simula Research Laboratory, Norway)

TL;DR
ucTrace is a new profiling tool that provides detailed, fine-grained visualization of UCX-driven communication in HPC systems, aiding optimization and debugging of large-scale MPI applications.
Contribution
The paper introduces ucTrace, a novel profiler that visualizes UCX communication at a fine-grained level, linking MPI operations to device and transport-layer behaviors.
Findings
ucTrace effectively visualizes MPI communication patterns.
It reveals transport-layer behaviors and device interactions.
Demonstrates utility in optimizing large-scale HPC applications.
Abstract
UCX is a communication framework that enables low-latency, high-bandwidth communication in HPC systems. With its unified API, UCX facilitates efficient data transfers across multi-node CPU-GPU clusters. UCX is widely used as the transport layer for MPI, particularly in GPU-aware implementations. However, existing profiling tools lack fine-grained communication traces at the UCX level, do not capture transport-layer behavior, or are limited to specific MPI implementations. To address these gaps, we introduce ucTrace, a novel profiler that exposes and visualizes UCX-driven communication in HPC environments. ucTrace provides insights into MPI workflows by profiling message passing at the UCX level, linking operations between hosts and devices (e.g., GPUs and NICs) directly to their originating MPI functions. Through interactive visualizations of process- and device-specific interactions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
