FlashCommunication V2: Bit Splitting and Spike Reserving for Any Bit Communication

Qingyuan Li; Bo Zhang; Hui Kang; Tianhao Xu; Yulei Qian; Yuchen Xie; Lin Ma

arXiv:2508.03760·cs.DC·August 7, 2025

FlashCommunication V2: Bit Splitting and Spike Reserving for Any Bit Communication

Qingyuan Li, Bo Zhang, Hui Kang, Tianhao Xu, Yulei Qian, Yuchen Xie, Lin Ma

PDF

TL;DR

FlashCommunication V2 introduces bit splitting and spike reserving techniques to enable efficient, flexible, and low-overhead cross-GPU communication at arbitrary bit widths, significantly accelerating distributed training of large language models.

Contribution

It presents novel bit splitting and spike reserving methods that improve communication efficiency and flexibility for low-bit quantization in distributed GPU systems.

Findings

01

Achieves up to 3.2× speedup in AllReduce operations.

02

Enables 2-bit quantization with acceptable accuracy loss.

03

Demonstrates robustness across NVLink and PCIe architectures.

Abstract

Nowadays, communication bottlenecks have emerged as a critical challenge in the distributed training and deployment of large language models (LLMs). This paper introduces FlashCommunication V2, a novel communication paradigm enabling efficient cross-GPU transmission at arbitrary bit widths. Its core innovations lie in the proposed bit splitting and spike reserving techniques, which address the challenges of low-bit quantization. Bit splitting decomposes irregular bit widths into basic units, ensuring compatibility with hardware capabilities and thus enabling transmission at any bit width. Spike reserving, on the other hand, retains numerical outliers (i.e., minima and maxima) as floating-point numbers, which shrinks the dynamic numerical range and pushes the quantization limits to 2-bit with acceptable losses. FlashCommunication V2 significantly enhances the flexibility and resource…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.