TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

Guangxin He; Yuan Cao; Yutong He; Tianyi Bai; Kai Chen; Kun Yuan; Binhang Yuan

arXiv:2506.01352·cs.LG·May 12, 2026

TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

Guangxin He, Yuan Cao, Yutong He, Tianyi Bai, Kai Chen, Kun Yuan, Binhang Yuan

PDF

TL;DR

TAH-Quant is a novel activation quantization framework for pipeline parallelism that significantly reduces communication overhead and accelerates training of large language models over slow networks.

Contribution

It introduces tile-wise adaptive quantization with entropy guidance and a Hadamard transform, maintaining convergence while achieving 3-4 bits quantization and substantial speedups.

Findings

01

Achieves 3-4 bits activation quantization ratio.

02

Provides up to 4.3x throughput speedup over FP32.

03

Maintains convergence rate comparable to SGD.

Abstract

Decentralized training of large language models offers the opportunity to pool computational resources across geographically distributed participants, but is often bottlenecked by network communication, particularly under pipeline parallel settings. While pipeline parallelism partitions model layers across devices to handle large-scale models, it necessitates frequent communication of intermediate activations, creating challenges when network bandwidth is limited. To address these issues, we propose TAH-Quant (Tile-wise Adaptive Hadamard Quantization), a novel activation quantization framework for pipeline parallelism. TAH-Quant integrates fine-grained tile-wise quantization, entropy-guided tile-wise adaptive bit allocation for optimal bit usage, and a Hadamard-based transformation with pivot swapping to effectively suppress outliers. Compared with token-level allocation, the tile-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.