Fast and Slow Gradient Approximation for Binary Neural Network   Optimization

Xinquan Chen; Junqi Gao; Biqing Qi; Dong Li; Yiang Luo; Fangyuan Li,; Pengfei Li

arXiv:2412.11777·cs.LG·December 17, 2024

Fast and Slow Gradient Approximation for Binary Neural Network Optimization

Xinquan Chen, Junqi Gao, Biqing Qi, Dong Li, Yiang Luo, Fangyuan Li,, Pengfei Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel gradient approximation method for Binary Neural Networks that leverages historical gradient data and layer-specific embeddings, resulting in faster convergence and improved accuracy.

Contribution

It proposes the Fast and Slow Gradient Generation (FSG) method with Historical Gradient Storage and Layer Recognition Embeddings to enhance hypernetwork-based BNN optimization.

Findings

01

FSG achieves faster convergence on CIFAR datasets.

02

The method reduces loss values compared to baselines.

03

Incorporating historical gradients improves gradient estimation accuracy.

Abstract

Binary Neural Networks (BNNs) have garnered significant attention due to their immense potential for deployment on edge devices. However, the non-differentiability of the quantization function poses a challenge for the optimization of BNNs, as its derivative cannot be backpropagated. To address this issue, hypernetwork based methods, which utilize neural networks to learn the gradients of non-differentiable quantization functions, have emerged as a promising approach due to their adaptive learning capabilities to reduce estimation errors. However, existing hypernetwork based methods typically rely solely on current gradient information, neglecting the influence of historical gradients. This oversight can lead to accumulated gradient errors when calculating gradient momentum during optimization. To incorporate historical gradient information, we design a Historical Gradient Storage (HGS)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

two-tiger/fsg
pytorchOfficial

Videos

Fast and Slow Gradient Approximation for Binary Neural Network Optimization· underline

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need · HyperNetwork