TL;DR
This paper introduces EDGE, a scalable non-parametric mutual information estimator that achieves optimal statistical and computational efficiency, enabling advanced analysis of deep neural networks.
Contribution
The paper presents EDGE, the first MI estimator with linear time complexity that attains parametric MSE rates, combining LSH, dependency graphs, and ensemble bias reduction.
Findings
EDGE achieves O(N) complexity and O(1/N) MSE rate.
EDGE enables analysis of information flow in deep neural networks.
The estimator clarifies the information bottleneck controversy in DNNs.
Abstract
The Mutual Information (MI) is an often used measure of dependency between two random variables utilized in information theory, statistics and machine learning. Recently several MI estimators have been proposed that can achieve parametric MSE convergence rate. However, most of the previously proposed estimators have the high computational complexity of at least . We propose a unified method for empirical non-parametric estimation of general MI function between random vectors in based on i.i.d. samples. The reduced complexity MI estimator, called the ensemble dependency graph estimator (EDGE), combines randomized locality sensitive hashing (LSH), dependency graphs, and ensemble bias-reduction methods. We prove that EDGE achieves optimal computational complexity , and can achieve the optimal parametric MSE rate of if the density is times…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia?
