BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Luca Colombo, Fabrizio Pittorino, Daniele Zambon, Carlo Baldassi, Manuel Roveri, Cesare Alippi

TL;DR
This paper introduces Binary Error Propagation (BEP), a novel binary training algorithm for neural networks that enables fully binary, efficient, end-to-end training, including for recurrent architectures, with improved accuracy.
Contribution
BEP is the first discrete backpropagation-like algorithm for binary neural networks, allowing error signals to be propagated using only binary operations.
Findings
Achieves up to +6.89% accuracy on multilayer perceptrons
Achieves up to +10.57% accuracy on recurrent neural networks
Operates entirely with binary computations, enhancing efficiency
Abstract
Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumption. These advantages make them particularly well suited for deployment on resource-constrained devices. However, training BNNs via gradient-based optimization remains challenging due to the discrete nature of their variables. The dominant approach, quantization-aware training, circumvents this issue by employing surrogate gradients. Yet, this method requires maintaining latent full-precision parameters and performing the backward pass with floating-point arithmetic, thereby forfeiting the efficiency of binary operations during training. While alternative approaches based on local learning rules exist, they are unsuitable for global credit assignment and for back-propagating errors in multi-layer…
Peer Reviews
Decision·ICLR 2026 Poster
Novelty: the paper introduces an alternative approach to regular backpropagation by reformulating the algorithm to work for binary weights. Different from other approaches that rely on real value parameters, like a straight-through estimator, this work focuses on the idea of computing forward and backward on the binary domain. The work displays consistent improvements over the classification task on two datasets.
1. Lack of organization, it is difficult to read smoothly, as figures, such as Figure 2, are on page 7, and mentioned in page 8, same as Table 1, where it is mentioned in page 9, and it is on page 8. 2. The authors claim computational efficiency and memory reduction, but ablation studies over flops, training, and inference time, as well as details over the computational equipment, are not stated. 3. The authors propose several hyperparameters, even though in section 4.4 is an analysis. Furth
- The paper is clearly written and well organized, with notation that is consistent and easy to follow. - The presentation is concise and direct, and I found the mathematical derivations correct and well-grounded. - The method is novel and well motivated, and the experimental results are consistent with most of the claims.
- In Section 3.1, the authors obtain binary input representations using a fixed binarization method (median or thermometer encoding) but do not analyze how different binarization functions affect BEP's performance. - The margin parameter $r$ in Eq. (1) determines when binary updates are triggered, but its value and sensitivity are not analyzed. Since this parameter effectively affects the learning dynamics, the authors should provide an ablation to show how it influences performance. - The pap
1. The method is reasonable. And the formalization of a binary version of global credit assignment for BNNs is clear. 2. The empirical results are broad, showing the advantage of BEP.
1. The experiments are not sufficient. Although the authors conduct experiments on both MLP and RNN models to demonstrate the effectiveness of their method, the experimental setup appears somewhat toy. Could the authors perform experiments on larger-scale networks (e.g., comparable to ResNet in size) and datasets (e.g., ImageNet-1k)? In addition, since the BNN field has been studied for a long time, could the authors compare BEP with other BNN methods (like [1-3]) beyond QAT to further validate
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
