SWALP : Stochastic Weight Averaging in Low-Precision Training
Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew, Gordon Wilson, Christopher De Sa

TL;DR
SWALP introduces a low-precision training method that averages stochastic gradient descent iterates with a modified learning rate, achieving near full-precision performance and convergence guarantees.
Contribution
It presents SWALP, a simple low-precision training approach that matches full-precision SGD performance with 8-bit quantization and provides convergence analysis.
Findings
SWALP matches full-precision SGD performance with 8-bit quantization.
SWALP converges close to the optimal solution for quadratic objectives.
It achieves a smaller noise ball than low-precision SGD in strongly convex settings.
Abstract
Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
MethodsStochastic Gradient Descent
