Learning Representations for Independence Testing

Nathaniel Xu; Feng Liu; Danica J. Sutherland

arXiv:2409.06890·stat.ML·March 23, 2026

Learning Representations for Independence Testing

Nathaniel Xu, Feng Liu, Danica J. Sutherland

PDF

Open Access 3 Reviews

TL;DR

This paper explores learning representations to improve independence testing, leveraging variational mutual information estimators and deep kernels to enhance detection power in high-dimensional, complex distributions.

Contribution

It introduces a method to learn representations that maximize test power, connecting variational mutual information bounds with kernel-based HSIC tests, and corrects misconceptions in existing HSIC optimization.

Findings

01

Optimized HSIC tests outperform other methods on structured dependence problems.

02

Deep kernel learning enhances the power of independence tests.

03

Variational mutual information estimators enable finite-sample valid tests.

Abstract

Many tools exist to detect dependence between random variables, a core question across a wide range of machine learning, statistical, and scientific endeavors. Although several statistical tests guarantee eventual detection of any dependence with enough samples, standard tests may require an exorbitant amount of samples for detecting subtle dependencies between high-dimensional random variables with complex distributions. In this work, we study two related ways to learn powerful independence tests. First, we show how to construct powerful statistical tests with finite-sample validity by using variational estimators of mutual information, such as the InfoNCE or NWJ estimators. Second, we establish a close connection between these variational mutual information-based tests and tests based on the Hilbert-Schmidt Independence Criterion (HSIC); in particular, learning a variational bound…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

The paper is technically well presented, and the claims are well supported, theoretically and experimentally. It introduces novel approaches for learning representations that enhances the power of independence tests, particularly in high-dimensional settings, improving upon traditional methods. Additionally, the work demonstrates theoretical validity and empirical effectiveness of the proposed approaches, complemented by extensive evaluations that reinforce its contributions to the field.

Weaknesses

- A notable issue is the lack of clarity regarding the aims and achievements of the paper. For instance, while it claims to present two approaches for independence testing, the conclusion suggests that it only studies one approach, leading to confusion about the overall contribution and focus of the work. - The performance of HSIC-based tests is significantly influenced by the choice of kernel. While the paper proposes methods to optimize kernel selection, identifying the appropriate kernel for

Reviewer 02Rating 6Confidence 4

Strengths

This paper is very well written and I was able to read it smoothly. The idea of minimizing the asymptotic power instead of the dependence metric itself is interesting. It is demonstrated in the experiments that their approach improves HSIC's power in high-dimensional problems.

Weaknesses

A few concerns I have for this paper: 1. It seems to me that the authors didn't highlight their core contribution enough. The InfoNCE approach already exists in the literature and making an independence test out of it based on permutation is not very interesting; applying this trick to HSIC either. The interesting part to me is they propose to minimize the SNR -- a quantity that characterizes the statistical power -- instead of the test statistics themselves. However, this contribution is not s

Reviewer 03Rating 5Confidence 3

Strengths

1. I like the idea of deriving an optimizer for independence testing directly from the hypothesis test error. This makes its motivation more explicit, rather than information theoretic methods which can be thought of as 'indirect', or inducing independence as a byproduct. The authors directly demonstrate how variational mi bounds are not aiming for P_e minimization. 2. The authors present formal guarantees of the proposed method and connect it to variational bounds of mutual information, which a

Weaknesses

1. I feel that the paper could benefit from better representation, which makes it harder for the reader to understand theciontributions in a bird's view, e.g., some crucial parts are migrated to the appendix, such as the assumptions of the main theorem of the paper, while some parts do not benefit the presentation in my opinion, such as some of the very specific and technical details in Section 5. 2. As the authors utilize 'heavy' machinery such as NN optimization, I would expect at least one ex

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Data Classification