PICO: Performance Insights for Collective Operations
Saverio Pasqualoni, Tommaso Bonato, Lorenzo Piarulli, Torsten Hoefler, Marco Canini, Daniele De Sensi

TL;DR
PICO is an open-source framework that systematically benchmarks and optimizes collective operations in HPC and AI, revealing significant performance improvements through detailed analysis and tuning.
Contribution
It introduces a platform-agnostic, reproducible benchmarking tool with adaptive parameter selection and detailed profiling for collective operations.
Findings
Default algorithms can be up to 5x slower than optimal choices.
PICO isolates topology-sensitive algorithmic effects.
Optimized profiles reduce training times by up to 44% in AI workloads.
Abstract
Collective operations are cornerstones of both HPC applications and large-scale AI training and inference, yet benchmarking them in a systematic and reproducible way remains difficult on modern systems due to the complexity of their hardware and software stacks. Existing suites primarily report end-to-end timings and offer limited support for controlled algorithm and configuration selection, fine-grained profiling, and capturing the runtime environment. We present PICO (Performance Insights for Collective Operations), an open-source framework that decouples portable experiment setup from platform execution, provides a backend-adaptive parameter selection interface across MPI and NCCL, supplies plain-MPI reference collective implementations, optionally instrumentable, and records the system configuration for reproducible comparisons. Evaluated on three major supercomputers, PICO shows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
