Layerwise Progressive Freezing Enables STE-Free Training of Deep Binary Neural Networks
Evan Gibson Smith, Bashima Islam

TL;DR
This paper introduces StoMPP, a layerwise progressive freezing method that enables training deep binary neural networks without straight-through estimators, leading to significant accuracy improvements especially in deeper models.
Contribution
The paper proposes StoMPP, a novel stochastic masking approach for progressive freezing that improves training of deep binary neural networks without STE, outperforming baseline methods.
Findings
StoMPP improves accuracy over STE baselines, especially in deep networks.
Layerwise stochastic masking facilitates training of full binary neural networks.
Training dynamics show non-monotonic convergence and better depth scaling.
Abstract
We investigate progressive freezing as an alternative to straight-through estimators (STE) for training binary networks from scratch. Under controlled training conditions, we find that while global progressive freezing works for binary-weight networks, it fails for full binary neural networks due to activation-induced gradient blockades. We introduce StoMPP (Stochastic Masked Partial Progressive Binarization), which uses layerwise stochastic masking to progressively replace differentiable clipped weights/activations with hard binary step functions, while only backpropagating through the unfrozen (clipped) subset (i.e., no straight-through estimator). Under a matched minimal training recipe, StoMPP improves accuracy over a BinaryConnect-style STE baseline, with gains that increase with depth (e.g., for ResNet-50 BNN: +18.0 on CIFAR-10, +13.5 on CIFAR-100, and +3.8 on ImageNet; for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
