SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

Haoyu Huang; Boyu Liu; Linlin Yang; Yanjing Li; Yuguang Yang; Xuhui Liu; Canyu Chen; Zhongqian Fu; Baochang Zhang

arXiv:2605.10989·cs.LG·May 18, 2026

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

Haoyu Huang, Boyu Liu, Linlin Yang, Yanjing Li, Yuguang Yang, Xuhui Liu, Canyu Chen, Zhongqian Fu, Baochang Zhang

PDF

TL;DR

SURGE introduces a learnable gradient adaptation framework for binary neural networks, improving training stability and accuracy by addressing gradient mismatch and information loss issues.

Contribution

It proposes a novel dual-path gradient compensator and adaptive gradient scaler, providing a theoretically grounded, learnable solution for better BNN training.

Findings

01

SURGE outperforms state-of-the-art methods on image classification tasks.

02

It improves training stability and reduces gradient mismatch in BNNs.

03

Experiments show enhanced performance in object detection and language understanding.

Abstract

The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Through Estimator (STE) and its improved variants, rely on hand-crafted designs that suffer from gradient mismatch problem and information loss induced by fixed-range gradient clipping. To address this, we propose SURrogate GradiEnt Adaptation (SURGE), a novel learnable gradient compensation framework with theoretical grounding. SURGE mitigates gradient mismatch through auxiliary backpropagation. Specifically, we design a Dual-Path Gradient Compensator (DPGC) that constructs a parallel full-precision auxiliary branch for each binarized layer, decoupling gradient flow via output decomposition during backpropagation. DPGC enables bias-reduced gradient estimation by leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.