Lion Cub: Minimizing Communication Overhead in Distributed Lion
Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden

TL;DR
Lion Cub introduces a communication-efficient distributed training method for the Lion optimizer, achieving up to 5x speedups by combining tailored quantization and selective momentum synchronization.
Contribution
The paper presents Lion Cub, a novel approach that reduces communication overhead in distributed Lion training through optimized quantization and momentum synchronization techniques.
Findings
Up to 5x speedup in training time.
Effective quantization methods for Lion.
Reduced communication costs without sacrificing convergence.
Abstract
Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techniques have been explored for SGD and Adam, the Lion optimizer has the distinct advantage that its update vectors are the output of a sign operation, enabling straightforward quantization. However, simply compressing updates for communication and using techniques like majority voting fails to lead to end-to-end speedups due to inefficient communication algorithms and reduced convergence. We analyze three factors critical to distributed learning with Lion: optimizing communication methods, identifying effective quantization methods, and assessing the necessity of momentum synchronization. Our findings show that quantization techniques adapted to Lion and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Cloud Computing and Resource Management · Mobile Agent-Based Network Management
MethodsAdam · Evolved Sign Momentum · Stochastic Gradient Descent
