Enhancing SignSGD: Small-Batch Convergence Analysis and a Hybrid Switching Strategy

Haoran Chen; Wentao Wang

arXiv:2604.25550·cs.LG·April 29, 2026

Enhancing SignSGD: Small-Batch Convergence Analysis and a Hybrid Switching Strategy

Haoran Chen, Wentao Wang

PDF

TL;DR

This paper improves SignSGD by analyzing small-batch convergence, introducing a dithering technique to recover magnitude information, and developing a hybrid switching strategy to enhance training performance.

Contribution

It provides a new convergence analysis for SignSGD with small batches, incorporates Gaussian noise dithering, and proposes a smooth transition from SignSGD to SGD.

Findings

01

Dithering improves SignSGD's accuracy on CIFAR-100.

02

Hybrid switching strategy outperforms pure SGD and SignSGD on CIFAR-10.

03

Small-batch convergence rate derived under symmetric gradient noise.

Abstract

SignSGD compresses each stochastic gradient coordinate to a single bit, offering substantial memory and communication savings, but its 1-bit quantization removes magnitude information and is known to leave a generalization gap relative to well-tuned SGD. We revisit SignSGD from a 1-bit quantization and dithering perspective and contribute three improvements. First, we derive a small-batch convergence rate for SignSGD under unimodal symmetric gradient noise using a signal-to-noise weighted stationarity measure, removing the large-batch assumption of prior analyses. Second, we inject annealed Gaussian noise before the sign operator, which acts as a classical dithering mechanism and probabilistically restores magnitude information lost to hard thresholding. Third, we adapt the SWATS strategy to sign-based updates with a projection-based learning-rate calibration that smoothly transitions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.