Gradient conjugate priors and multi-layer neural networks
Pavel Gurevich, Hannes Stuke

TL;DR
This paper introduces a gradient conjugate prior (GCP) method for neural networks to learn data distributions, connecting Bayesian updates with likelihood maximization, and analyzing its properties and behavior.
Contribution
It proposes a novel GCP update for neural networks, linking Bayesian conjugate priors with likelihood maximization, and studies its dynamical system properties.
Findings
The GCP update effectively learns distribution parameters.
The dynamical system analysis reveals unique limiting behaviors.
Validation on datasets confirms the method's practical utility.
Abstract
The paper deals with learning probability distributions of observed data by artificial neural networks. We suggest a so-called gradient conjugate prior (GCP) update appropriate for neural networks, which is a modification of the classical Bayesian update for conjugate priors. We establish a connection between the gradient conjugate prior update and the maximization of the log-likelihood of the predictive distribution. Unlike for the Bayesian neural networks, we use deterministic weights of neural networks, but rather assume that the ground truth distribution is normal with unknown mean and variance and learn by the neural networks the parameters of a prior (normal-gamma distribution) for these unknown mean and variance. The update of the parameters is done, using the gradient that, at each step, directs towards minimizing the Kullback--Leibler divergence from the prior to the posterior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
