NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

Jiamin Wang; Zhijing Ye; Xiaodong Yu

arXiv:2605.12396·cs.DC·May 13, 2026

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

Jiamin Wang, Zhijing Ye, Xiaodong Yu

PDF

TL;DR

NCCLZ enhances GPU collective communication by decoupling quantization and entropy coding, enabling flexible, efficient compression that significantly accelerates multi-node GPU workloads.

Contribution

It introduces a novel approach that separates quantization and entropy coding layers, improving compression ratio, flexibility, and overlap with communication in GPU collectives.

Findings

01

Achieves up to 9.65x speedup over NCCL.

02

Provides up to 3.34x improvement over prior compression libraries.

03

Effectively overlaps compression with communication to reduce overhead.

Abstract

Collective communication is a major bottleneck for multi-node GPU workloads in scientific computing and distributed deep learning, especially when inter-node bandwidth is limited. Although NCCL provides optimized GPU-centric collectives, large messages can still dominate end-to-end performance. Existing compression-enabled collective libraries either rely on MPI-based stacks that cannot fully exploit NCCL, omit entropy coding, or tightly couple full compressors with communication primitives, limiting compression ratio, flexibility, and communication-computation overlap. This paper presents NCCLZ, a compression-enabled GPU collectives that decouples quantization and entropy coding and integrates them at different layers of the stack. NCCLZ places quantization at the interface, embeds entropy coding into NCCL primitives, uses a lightweight device-side selector to choose coding strategies,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.