BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods to Deep Binary Model
Junjie Liu, Dongchao Wen, Deyu Wang, Wei Tao, Tse-Wei Chen, Kinya Osa,, Masami Kato

TL;DR
This paper introduces BAMSProd, an adaptive optimization algorithm designed to improve the training and convergence of deep binary neural networks by constraining gradient ranges and reducing errors, leading to faster training and better performance.
Contribution
The paper proposes BAMSProd, a novel adaptive optimizer that addresses gradient mismatch issues in BNNs, enhancing convergence and accuracy over existing methods.
Findings
BAMSProd speeds up convergence by approximately 1.2 times.
It improves BNN performance by about 3.7% over existing optimizers.
Theoretical analysis confirms convergence benefits of the method.
Abstract
Recent methods have significantly reduced the performance degradation of Binary Neural Networks (BNNs), but guaranteeing the effective and efficient training of BNNs is an unsolved problem. The main reason is that the estimated gradients produced by the Straight-Through-Estimator (STE) mismatches with the gradients of the real derivatives. In this paper, we provide an explicit convex optimization example where training the BNNs with the traditionally adaptive optimization methods still faces the risk of non-convergence, and identify that constraining the range of gradients is critical for optimizing the deep binary model to avoid highly suboptimal solutions. For solving above issues, we propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors. In brief, it employs an adaptive range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
MethodsAMSGrad
