Linear cost mutual information estimation and independence test of similar performance as HSIC
Jarek Duda, Jagoda Bracha, Adrian Przybysz

TL;DR
This paper introduces a linear-time method for mutual information estimation and independence testing that rivals HSIC in sensitivity, using Hierarchical Correlation Reconstruction to model dependencies efficiently.
Contribution
It presents HCR as a practical, linear-cost alternative to HSIC for dependency testing, providing joint distribution models and mutual information approximation.
Findings
HCR achieves higher sensitivity to dependencies than HSIC.
Mutual information can be approximated efficiently using mixed moments.
The method scales linearly with data size, suitable for large datasets.
Abstract
Evaluation of statistical dependencies between two data samples is a basic problem of data science/machine learning, and HSIC (Hilbert-Schmidt Information Criterion)~\cite{HSIC} is considered the state-of-art method. However, for size data sample it requires multiplication of matrices, what currently needs computational complexity~\cite{mult}, making it impractical for large data samples. We discuss HCR (Hierarchical Correlation Reconstruction) as its linear cost practical alternative, in tests of even higher sensitivity to dependencies, and additionally providing actual joint distribution model for chosen significance level, by description of dependencies through features being mixed moments, starting with correlation and homoscedasticity. Also allowing to approximate mutual information as just sum of squares of such nontrivial mixed moments between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
