High-Accuracy Low-Precision Training
Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev,, Christopher R. Aberger, Kunle Olukotun, Christopher R\'e

TL;DR
This paper introduces HALP, a low-precision stochastic gradient descent method that maintains convergence rates comparable to full-precision algorithms by using variance reduction and bit centering, enabling faster and more efficient training.
Contribution
The paper presents HALP, a novel low-precision training algorithm that achieves full-precision convergence rates using variance reduction and bit centering techniques.
Findings
HALP can run up to 4x faster than full-precision SVRG on CPU.
HALP matches the convergence trajectory of full-precision algorithms.
HALP outperforms plain low-precision SGD on deep learning tasks.
Abstract
Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, which limits statistical accuracy. To address this issue, we describe a simple low-precision stochastic gradient descent variant called HALP. HALP converges at the same theoretical rate as full-precision algorithms despite the noise introduced by using low precision throughout execution. The key idea is to use SVRG to reduce gradient variance, and to combine this with a novel technique called bit centering to reduce quantization error. We show that on the CPU, HALP can run up to …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · Radiomics and Machine Learning in Medical Imaging · 3D Shape Modeling and Analysis
MethodsStochastic Gradient Descent
