Minimal Random Code Learning with Mean-KL Parameterization

Jihao Andreas Lin; Gergely Flamich; Jos\'e Miguel Hern\'andez-Lobato

arXiv:2307.07816·cs.LG·December 5, 2023

Minimal Random Code Learning with Mean-KL Parameterization

Jihao Andreas Lin, Gergely Flamich, Jos\'e Miguel Hern\'andez-Lobato

PDF

Open Access

TL;DR

This paper introduces a novel Mean-KL parameterization for variational Bayesian neural network compression, leading to faster convergence, improved robustness, and more meaningful distributions compared to traditional methods.

Contribution

It proposes a Mean-KL parameterization for MIRACLE that simplifies training and enhances robustness and interpretability of compressed neural network weights.

Findings

01

Faster convergence in variational training.

02

More robust and meaningful weight distributions.

03

Improved compression performance with heavier tails.

Abstract

This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{w}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{w}$ . To achieve the desired compression rate, $D_{KL} [Q_{w} ∥ P_{w}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{w}$ . Instead, we parameterize $Q_{w}$ by its mean and KL divergence from $P_{w}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning