UCCL-Zip: Lossless Compression Supercharged GPU Communication

Shuang Ma; Chon Lam Lao; Zhiying Xu; Zhuang Wang; Ziming Mao; Delong Meng; Jia Zhen; Jun Wu; Ion Stoica; Yida Wang; Yang Zhou

arXiv:2604.17172·cs.DC·April 23, 2026

UCCL-Zip: Lossless Compression Supercharged GPU Communication

Shuang Ma, Chon Lam Lao, Zhiying Xu, Zhuang Wang, Ziming Mao, Delong Meng, Jia Zhen, Jun Wu, Ion Stoica, Yida Wang, Yang Zhou

PDF

TL;DR

UCCL-Zip introduces a lossless GPU communication compression method that enhances efficiency in large language model workloads without affecting accuracy or requiring API modifications.

Contribution

It presents a unified, lossless compression framework integrated into GPU communication primitives supporting both point-to-point and collective operations.

Findings

01

Accelerates RL weight synchronization by up to 47.5%.

02

Reduces vLLM inference latency by up to 10%.

03

Maintains numerical correctness without application changes.

Abstract

The rapid growth of large language models (LLMs) has made GPU communication a critical bottleneck. While prior work reduces communication volume via quantization or lossy compression, these approaches introduce numerical errors that can degrade convergence, accuracy, and stability. We present UCCL-Zip, a unified design that integrates lossless compression directly into GPU communication primitives. UCCL-Zip supports both point-to-point (P2P) and collective communication without modifying user-facing APIs or compromising numerical correctness. For P2P communication, Uzip-P2P employs a split-send pipeline that exposes transmissible data early and overlaps compression with communication, while preserving high GPU efficiency by operating on large data blocks. For collective communication, Uzip-NCCL integrates compression into NCCL's persistent kernel model via fused execution, eliminating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.