BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training

Luca Colombo; Fabrizio Pittorino; Daniele Zambon; Carlo Baldassi; Manuel Roveri; Cesare Alippi

arXiv:2512.04189·cs.LG·February 18, 2026

BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training

Luca Colombo, Fabrizio Pittorino, Daniele Zambon, Carlo Baldassi, Manuel Roveri, Cesare Alippi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Binary Error Propagation (BEP), a novel binary training algorithm for neural networks that enables fully binary, efficient, end-to-end training, including for recurrent architectures, with improved accuracy.

Contribution

BEP is the first discrete backpropagation-like algorithm for binary neural networks, allowing error signals to be propagated using only binary operations.

Findings

01

Achieves up to +6.89% accuracy on multilayer perceptrons

02

Achieves up to +10.57% accuracy on recurrent neural networks

03

Operates entirely with binary computations, enhancing efficiency

Abstract

Binary Neural Networks (BNNs), which constrain both weights and activations to binary values, offer substantial reductions in computational complexity, memory footprint, and energy consumption. These advantages make them particularly well suited for deployment on resource-constrained devices. However, training BNNs via gradient-based optimization remains challenging due to the discrete nature of their variables. The dominant approach, quantization-aware training, circumvents this issue by employing surrogate gradients. Yet, this method requires maintaining latent full-precision parameters and performing the backward pass with floating-point arithmetic, thereby forfeiting the efficiency of binary operations during training. While alternative approaches based on local learning rules exist, they are unsuitable for global credit assignment and for back-propagating errors in multi-layer…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

Novelty: the paper introduces an alternative approach to regular backpropagation by reformulating the algorithm to work for binary weights. Different from other approaches that rely on real value parameters, like a straight-through estimator, this work focuses on the idea of computing forward and backward on the binary domain. The work displays consistent improvements over the classification task on two datasets.

Weaknesses

1. Lack of organization, it is difficult to read smoothly, as figures, such as Figure 2, are on page 7, and mentioned in page 8, same as Table 1, where it is mentioned in page 9, and it is on page 8. 2. The authors claim computational efficiency and memory reduction, but ablation studies over flops, training, and inference time, as well as details over the computational equipment, are not stated. 3. The authors propose several hyperparameters, even though in section 4.4 is an analysis. Furth

Reviewer 02Rating 6Confidence 4

Strengths

- The paper is clearly written and well organized, with notation that is consistent and easy to follow. - The presentation is concise and direct, and I found the mathematical derivations correct and well-grounded. - The method is novel and well motivated, and the experimental results are consistent with most of the claims.

Weaknesses

- In Section 3.1, the authors obtain binary input representations using a fixed binarization method (median or thermometer encoding) but do not analyze how different binarization functions affect BEP's performance. - The margin parameter $r$ in Eq. (1) determines when binary updates are triggered, but its value and sensitivity are not analyzed. Since this parameter effectively affects the learning dynamics, the authors should provide an ablation to show how it influences performance. - The pap

Reviewer 03Rating 4Confidence 2

Strengths

1. The method is reasonable. And the formalization of a binary version of global credit assignment for BNNs is clear. 2. The empirical results are broad, showing the advantage of BEP.

Weaknesses

1. The experiments are not sufficient. Although the authors conduct experiments on both MLP and RNN models to demonstrate the effectiveness of their method, the experimental setup appears somewhat toy. Could the authors perform experiments on larger-scale networks (e.g., comparable to ResNet in size) and datasets (e.g., ImageNet-1k)? In addition, since the BNN field has been studied for a long time, could the authors compare BEP with other BNN methods (like [1-3]) beyond QAT to further validate

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Stochastic Gradient Optimization Techniques