Layer-wise Quantization for Quantized Optimistic Dual Averaging

Anh Duc Nguyen; Ilia Markov; Frank Zhengqing Wu; Ali Ramezani-Kebrya; Kimon Antonakopoulos; Dan Alistarh; Volkan Cevher

arXiv:2505.14371·cs.LG·May 21, 2025

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Anh Duc Nguyen, Ilia Markov, Frank Zhengqing Wu, Ali Ramezani-Kebrya, Kimon Antonakopoulos, Dan Alistarh, Volkan Cevher

PDF

Open Access 1 Video

TL;DR

This paper introduces a layer-wise quantization framework and a novel QODA algorithm for distributed variational inequalities, improving training efficiency of deep neural networks with heterogenous layers.

Contribution

It proposes a general layer-wise quantization method with tight bounds and a new QODA algorithm with adaptive learning rates for monotone variational inequalities.

Findings

01

QODA achieves up to 150% speedup in training Wasserstein GANs.

02

Layer-wise quantization adapts to heterogeneity across neural network layers.

03

The framework provides tight variance and code-length bounds.

Abstract

Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150%$ speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12 +$ GPUs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Layer-wise Quantization for Quantized Optimistic Dual Averaging· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning