Information Theory with Kernel Methods
Francis Bach (SIERRA)

TL;DR
This paper introduces kernel-based information measures like entropy and mutual information for probability distributions, enabling efficient estimation and applications in probabilistic inference.
Contribution
It develops kernel-based entropy and mutual information concepts, linking them to classical information theory and enabling new inference bounds and algorithms.
Findings
Kernel-based entropy relates to Shannon entropy.
Mutual information characterizes independence with tensor product kernels.
New bounds on log partition functions for variational inference.
Abstract
We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from various oracles on the probability distributions. We also consider product spaces and show that for tensor product kernels, we can define notions of mutual information and joint entropies, which can then characterize independence perfectly, but only partially conditional independence. We finally show how these new notions of relative entropy lead to new upper-bounds on log partition functions, that can be used together with convex optimization within variational inference methods, providing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Bayesian Modeling and Causal Inference · Statistical Mechanics and Entropy
MethodsVariational Inference
