TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
Guangxin He, Yuan Cao, Yutong He, Tianyi Bai, Kai Chen, Kun Yuan, Binhang Yuan

TL;DR
TAH-Quant is a novel activation quantization framework for pipeline parallelism that significantly reduces communication overhead and accelerates training of large language models over slow networks.
Contribution
It introduces tile-wise adaptive quantization with entropy guidance and a Hadamard transform, maintaining convergence while achieving 3-4 bits quantization and substantial speedups.
Findings
Achieves 3-4 bits activation quantization ratio.
Provides up to 4.3x throughput speedup over FP32.
Maintains convergence rate comparable to SGD.
Abstract
Decentralized training of large language models offers the opportunity to pool computational resources across geographically distributed participants, but is often bottlenecked by network communication, particularly under pipeline parallel settings. While pipeline parallelism partitions model layers across devices to handle large-scale models, it necessitates frequent communication of intermediate activations, creating challenges when network bandwidth is limited. To address these issues, we propose TAH-Quant (Tile-wise Adaptive Hadamard Quantization), a novel activation quantization framework for pipeline parallelism. TAH-Quant integrates fine-grained tile-wise quantization, entropy-guided tile-wise adaptive bit allocation for optimal bit usage, and a Hadamard-based transformation with pivot swapping to effectively suppress outliers. Compared with token-level allocation, the tile-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
