SWALP : Stochastic Weight Averaging in Low-Precision Training

Guandao Yang; Tianyi Zhang; Polina Kirichenko; Junwen Bai; Andrew; Gordon Wilson; Christopher De Sa

arXiv:1904.11943·cs.LG·May 21, 2019·23 cites

SWALP : Stochastic Weight Averaging in Low-Precision Training

Guandao Yang, Tianyi Zhang, Polina Kirichenko, Junwen Bai, Andrew, Gordon Wilson, Christopher De Sa

PDF

Open Access 3 Repos

TL;DR

SWALP introduces a low-precision training method that averages stochastic gradient descent iterates with a modified learning rate, achieving near full-precision performance and convergence guarantees.

Contribution

It presents SWALP, a simple low-precision training approach that matches full-precision SGD performance with 8-bit quantization and provides convergence analysis.

Findings

01

SWALP matches full-precision SGD performance with 8-bit quantization.

02

SWALP converges close to the optimal solution for quadratic objectives.

03

It achieves a smaller noise ball than low-precision SGD in strongly convex settings.

Abstract

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Neural Networks and Applications

MethodsStochastic Gradient Descent