Sample as You Infer: Predictive Coding With Langevin Dynamics
Umais Zahid, Qinghai Guo, Zafeirios Fountas

TL;DR
This paper introduces a novel predictive coding-based algorithm for deep generative models that leverages Langevin dynamics, outperforming standard VAEs in sample quality and training efficiency.
Contribution
It proposes a Langevin sampling approach within predictive coding, enhanced with encoder warm-starts and preconditioning, to improve training and sample quality in deep generative models.
Findings
Outperforms standard VAEs in sample quality.
Converges faster than traditional SGD-based training.
Provides a robust, encoder-free training method.
Abstract
We present a novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience. Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder (VAE) training. By injecting Gaussian noise into the PC inference procedure we re-envision it as an overdamped Langevin sampling, which facilitates optimisation with respect to a tight evidence lower bound (ELBO). We improve the resultant encoder-free training method by incorporating an encoder network to provide an amortised warm-start to our Langevin sampling and test three different objectives for doing so. Finally, to increase robustness to the sampling step size and reduce sensitivity to curvature, we validate a lightweight and easily computable form of preconditioning,…
Peer Reviews
Decision·ICML 2024 Poster
The paper shows that the ideas in the predictive coding framework have quite practical applications and consequences in machine learning. The method is explained very clearly and the experiments are well-chosen to show that it is indeed competitive with VAEs on realistic datasets.
That the predictive coding framework when noise is added can be treated as overdamped Langevin sampling is a novel and interesting result. What is less clear is if it's significant enough as the result seems to be algorithm that's only competitive with well-established VAE algorithms. While the method is clear, quite a bit of the paper is spent on background material. Much of which is not even used in the rest of the paper. For example, lots of space is spent on discussing all the different div
I'm generally fine with this, SGLD has been used a lot for training Bayesian neural networks, so most of the literature is centered around how to overcome that particular problem (no dataset). This feels like a hole in the literature that is getting plugged, which is good. * ULD for training generative models * Approximate inference warm start network * Interesting Adam-based preconditioning, figure 3 makes sense if preconditioner is working. * Models converge faster on high-dim experiments
I think the method is fine enough. What's really bothering me is the high-dim experiments and their scores -- what are/were the limitations here? * Comparing only against the VAE seems like its just not enough, especially when the results compared to the VAE is not especially convincing. * Preconditioner does not always help, and when it does its not by that much. Unclear why and when it is supposed to help empirically. All things considered, I'm not very convinced that the method is benefic
- **The main ideas of the paper are presented clearly.** The paper takes ideas from predictive coding, variational inference and Langevin diffusion, and the authors present the connections between these ideas in an illuminating way. **The adaptive preconditioning method for Langevin steps also seems quite novel.**
- **No discussion of related works is provided/important references are missing.** Despite citing many recent developments in the fields of predictive coding and gradient-based Monte Carlo methods, no further discussion is made on how this work is connected to and/or provides novel perspectives compared to past work. As a consequence, I believe several important references are missing, for example (Hoffman 2017) is the original paper that explored the practical use of MCMC methods for deep laten
The proposed training method goes beyond the pointwise estimation of the latent variables that is typical of predictive coding. The langevin in fact, in principle allows to sample from the posterior of the latent variables given the data and the parameters.
1. The main weakness lies in the fact that the proposed algorithm is not compared to vanilla predictive coding training. Without this comparison it is impossible to establish if adding the randomness actully brings any benefit. 2. While it is said that fewer epochs are needed to learn, no assessment of the computational cost of running several steps of langevin dynamics for every parameter step is done and the need to use a VAE each time langevin iterations have to be initialized. A comparison
This paper provides thorough justification for its methodology with theory. It investigates and compares several different approaches training the amortised warm-start model. There are several important ablations done (such as the diagonal preconditioning). The experiment showing improved robustness to increased langevin step size when using preconditioning, across several image datasets, is important and interesting.
See questions.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Single-cell and spatial transcriptomics
MethodsStochastic Gradient Descent
