Towards the Training of Deeper Predictive Coding Neural Networks

Chang Qi; Matteo Forasassi; Thomas Lukasiewicz; Tommaso Salvatori

arXiv:2506.23800·cs.LG·October 13, 2025

Towards the Training of Deeper Predictive Coding Neural Networks

Chang Qi, Matteo Forasassi, Thomas Lukasiewicz, Tommaso Salvatori

PDF

Open Access 3 Reviews

TL;DR

This paper identifies key issues limiting deep predictive coding networks and proposes novel solutions, including precision-weighted optimization, error balancing, and auxiliary neurons, enabling performance comparable to backpropagation in deep models.

Contribution

The authors introduce new methods to improve training of deep predictive coding networks, addressing error imbalance, energy propagation, and residual effects, thus enhancing their scalability and effectiveness.

Findings

01

Achieved performance comparable to backpropagation on deep ResNets.

02

Balanced error distributions improve training stability.

03

Auxiliary neurons slow energy propagation in residual connections.

Abstract

Predictive coding networks are neural models that perform inference through an iterative energy minimization process, whose operations are local in space and time. While effective in shallow architectures, they suffer significant performance degradation beyond five to seven layers. In this work, we show that this degradation is caused by exponentially imbalanced errors between layers during weight updates, and by predictions from the previous layers not being effective in guiding updates in deeper layers. Furthermore, when training models with skip connections, the energy propagated by the residuals reaches higher layers faster than that propagated by the main pathway, affecting test accuracy. We address the first issue by introducing a novel precision-weighted optimization of latent variables that balances error distributions during the relaxation phase, the second issue by proposing a…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

The main strength of the submission is the clear correspondence between the stated problem and the main results (in Table 2). The proposed modifications improve PCN accuracy for deeper models. In addition, the proposed structural update to skip connections and batch normalization improves accuracy for ResNet models. Moreover, the more layers there are, the greater the gain.

Weaknesses

Below, I list the weaknesses observed in the presented submission: 1. The motivation for using PCN instead of the standard backpropagation (BP) remains unclear to me. Experiments demonstrate that the BP typically achieves higher test accuracy; therefore, the rationale for considering such a training scheme must be clearly explained at the outset, highlighting the advantages of PC over BP. 2. Experiments are smoothly distributed over the sections, and many references to Figures are missing in t

Reviewer 02Rating 2Confidence 4

Strengths

It proposes practical, PC-compatible remedies—precision scheduling, forward-aware updates, and small architectural tweaks—that are simple to implement yet demonstrably restore performance to near–backprop levels on deeper CNNs, including ResNet-18 on Tiny-ImageNet.

Weaknesses

**1. Framing/positioning issues** * Even though energy-based models (EBMs) in deep learning have continuously advanced (e.g., JEM, diffusion models), the authors single out only Hopfield-style energy functions and predictive coding as EBMs and frame the work as an effort to “scale up” a very general EBM framework; this characterization is somewhat misleading. **2. Notation / mathematical clarity** * Notation and formatting are inconsistent. At line 146, the function $f$ is used without prior

Reviewer 03Rating 4Confidence 2

Strengths

This work may first reveal that the energy is orders of magnitude larger in layers closer to the output in models trained with predictive coding (PC). To regulate the energy imbalance and improves test accuracy in deep PC models and the case incremental PC (iPC), this work proposes dynamical precision-weightings that depend on both time and layer depth, e.g. spiking precisions, that can achieve performance comparable to backpropagation in deep networks. To achieve the goal of slowing down th

Weaknesses

More details are needed to derive Equations (2) and (3). Are the model and analysis suitable for transformer architecture? If the algorithm's effect is only comparable to the backpropagation effect, what is its significance or advantage? It is necessary to first explain the relevant background or significance in detail. The comparison of relevant algorithms still needs to be strengthened, and the contribution significance over related state-of-the-art works should be highlighted. A pseudocod

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis