Bidirectional predictive coding
Gaspard Oliviers, Mufeng Tang, Rafal Bogacz

TL;DR
This paper introduces bidirectional predictive coding (bPC), a model combining generative and discriminative inference, improving performance in visual learning tasks and aligning more closely with biological visual processing.
Contribution
The paper proposes a novel bidirectional predictive coding model that integrates both inference types, maintaining biological plausibility and enhancing task performance.
Findings
bPC matches or outperforms unidirectional models in specialized tasks
bPC excels in multimodal learning and inference with missing data
bPC aligns more closely with biological visual inference
Abstract
Predictive coding (PC) is an influential computational model of visual learning and inference in the brain. Classical PC was proposed as a top-down generative model, where the brain actively predicts upcoming visual inputs, and inference minimises the prediction errors. Recent studies have also shown that PC can be formulated as a discriminative model, where sensory inputs predict neural activities in a feedforward manner. However, experimental evidence suggests that the brain employs both generative and discriminative inference, while unidirectional PC models show degraded performance in tasks requiring bidirectional processing. In this work, we propose bidirectional PC (bPC), a PC model that incorporates both generative and discriminative inference while maintaining a biologically plausible circuit implementation. We show that bPC matches or outperforms unidirectional models in their…
Peer Reviews
Decision·ICLR 2026 Poster
1. The bidirectional predictive coding formulation is conceptually simple, elegant, and aligns more closely with classical interactive cortical models. It is worth emphasizing, however, that in Rao and Ballard’s model, the initial feedforward sweep does convey bottom-up evidence (the bottom-up prediction), and only subsequent iterations carry the prediction residues or prediction error signals. Thus, the primary difference is in the steady-state treatment and symmetry of information flow. 2. Th
1. The conceptual innovation is modest and closely related to earlier models of bidirectional inference (interactive activation, adaptive resonance, hierarchical Bayesian models). The paper could better situate itself in this lineage. 2. Although the results show benefits over prior predictive-coding variants, scalability to real-world architectures and large-scale deep learning benchmarks remains unclear; current experiments appear focused on moderate-scale or simplified tasks. 3. Historicall
1. The idea of integrating bidirectional inference under a unified predictive coding framework is conceptually clear and aligns with prior theoretical literature. 2. The paper reports a broad range of experiments, including classification, image generation, multimodal learning, and occlusion robustness, showing some effort toward comprehensive evaluation.
1. The central idea of combining discriminative and generative pathways within a shared latent representation is not novel in the context of modern machine learning. Many recent architectures, such as VAVAE[1] or VAR[2], already unify discriminative and generative learning with better theoretical grounding and empirical performance. The proposed bPC model appears to be a minor variation of existing predictive coding formulations (discPC, genPC, hybridPC) with shared weights, rather than a fundam
This is a great paper. Conceptually simple, though with many smaller innovations that are only mentioned in passing, the presented work marks a great step towards highly functional predictive coding networks, fixing previous issues with capabilities and, importantly, scaling. The work is exceptionally thorough, well designed and -mostly- very clear (see below). Noteworthy is also the work in the appendix where the models are scaled up to large networks and many-class classification benchmarks.
Minor issues: - the issue of symmetric/shared bottom-up/top-down connections (l154) is a bit exaggerated, as it is easy to show that as long as identical local learning signals can be applied at either end, weight decay can result in symmetrical weights. (Tim Lillicrap wrote on this). - section 4.5 lacks clarity, in particular on what exactly is shown in Figure 7. What am I seeing here, what is the paradigm? - the issue of scaling, which this approach solves to a large extend, is only mentioned
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Embodied and Extended Cognition · Neural dynamics and brain function
