MINDE: Mutual Information Neural Diffusion Estimation

Giulio Franzese; Mustapha Bounoua; Pietro Michiardi

arXiv:2310.09031·cs.LG·May 16, 2024·1 cites

MINDE: Mutual Information Neural Diffusion Estimation

Giulio Franzese, Mustapha Bounoua, Pietro Michiardi

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces MINDE, a novel neural diffusion-based method for estimating mutual information and entropy between random variables, demonstrating superior accuracy and consistency over existing techniques.

Contribution

The paper presents a new MI estimation approach using score-based diffusion models derived from the Girsanov theorem, enabling more accurate and consistent measurements.

Findings

01

Outperforms existing MI estimation methods on challenging distributions

02

Passes MI self-consistency tests such as data processing and additivity

03

Provides a unified framework for estimating MI and entropy using diffusion models

Abstract

In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature,…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

I really like that the authors used the Czyz benchmark data, and also the consistency tests. I also appreciate the creativity of the theoretical advancement, though I don't understand it (see below).

Weaknesses

I was super excited to read this paper, because I love thinking about mutual information and entropy, and have recently been working on some related issues. The ideas are intriguing, and the results are impressive. So, the rest of this review will focus on the issues for me understanding the methods and results. 1. The biggest issue for me is that I almost immediately got lost. I know information theory pretty well, I learned it from Fred Jelinek before he died. That said, I know very little

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

1. The construction of the basic building blocks that establish the estimation of KL divergence and of the entropy is well organized and clearly written. 2. It’s interesting to see the SDE framework of diffusion models being used under the setting of MI estimation, which could inspire the research community to investigate diffusion models in new directions.

Weaknesses

1. While the utilization of score-based diffusion models can be justified by the Girsanov Theorem, it’s unclear how they are used as **generative models** (*i.e.*, using the reverse-time SDE to generate samples) — it seems that only forward diffusion SDEs are needed, in order to train the score networks. Therefore, it’s a bit confusing when the authors wrote “we explore the problem of estimating MI using generative models” (Page 1), instead of something like “we explore the problem of estimating

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

The problem considered is of critical importance in several applied and theoretical fields. Existing estimators either fail in high dimensions or require large amounts of data to provide precise estimates. The results are quite impressive, the proposed estimator seems to outperform alternatives in most settings. Several aspects of MI estimation that make the estimation challenging that were originally introduced in [1] such as sparsity, dimensionality, long tails, transformations, data process

Weaknesses

The organization of the paper makes it hard to follow. The measure theoretical notations make the paper inaccessible to the broader audience interested in using the estimator in applied settings. The contributions are not fully clear. The connections between score, KL, MI, and H existed before. In addition, it's an established fact that diffusion process models are more powerful density estimators specifically in higher dimensions making it less surprising that the MI and H estimators are super

Code & Models

Repositories

MustaphaBounoua/minde
jaxOfficial

Videos

MINDE: Mutual Information Neural Diffusion Estimation· slideslive

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques

MethodsDiffusion