RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization

Prathamesh Mayekar; Himanshu Tyagi

arXiv:1908.08200·cs.LG·December 17, 2019

RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization

Prathamesh Mayekar, Himanshu Tyagi

PDF

TL;DR

RATQ is a universal fixed-length gradient quantizer for stochastic optimization that nearly attains theoretical bounds, improves performance with adaptive gain quantization, and is effective for Gaussian and subgaussian vectors.

Contribution

Introduces RATQ, a simple, universal fixed-length quantizer for gradients that approaches optimal bounds and enhances performance with adaptive gain quantization.

Findings

01

RATQ nearly attains the information theoretic lower bound for optimization accuracy.

02

The adaptive gain quantizer with RATQ outperforms uniform gain quantization.

03

RATQ performs close to the optimal variable-length quantizers for distributed mean estimation.

Abstract

We present Rotated Adaptive Tetra-iterated Quantizer (RATQ), a fixed-length quantizer for gradients in first order stochastic optimization. RATQ is easy to implement and involves only a Hadamard transform computation and adaptive uniform quantization with appropriately chosen dynamic ranges. For noisy gradients with almost surely bounded Euclidean norms, we establish an information theoretic lower bound for optimization accuracy using finite precision gradients and show that RATQ almost attains this lower bound. For mean square bounded noisy gradients, we use a gain-shape quantizer which separately quantizes the Euclidean norm and uses RATQ to quantize the normalized unit norm vector. We establish lower bounds for performance of any optimization procedure and shape quantizer, when used with a uniform gain quantizer. Finally, we propose an adaptive quantizer for gain which when used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.