A Kernel Distribution Closeness Testing
Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu

TL;DR
This paper introduces a norm-adaptive MMD (NAMMD) for distribution closeness testing, improving test power and applicability to complex data like images by addressing limitations of traditional MMD.
Contribution
We propose NAMMD, a new discrepancy measure that scales MMD with RKHS norms, enhancing distribution closeness testing and two-sample testing performance.
Findings
NAMMD-based DCT has higher test power than MMD-based DCT.
Theoretical proof of bounded type-I error for NAMMD-based DCT.
Validated effectiveness on synthetic and real image data.
Abstract
The distribution closeness testing (DCT) assesses whether the distance between a distribution pair is at least -far. Existing DCT methods mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., using total variation), which limits their applications to complex data (e.g., images). To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measurement of the distributional discrepancy between two complex distributions, into DCT scenarios. However, we find that MMD's value can be the same for many pairs of distributions that have different norms in the same reproducing kernel Hilbert space (RKHS), making MMD less informative when assessing the closeness levels for multiple distribution pairs. To mitigate the issue, we design a new measurement of distributional discrepancy,…
Peer Reviews
Decision·Submitted to ICLR 2026
The closeness test is indeed different from traditional two-sample tests and represents an underexplored area in hypothesis testing, with potential downstream applications such as classifier transfer. The issue that kernel mean embeddings in MMD can have different RKHS norms but yield the same MMD value is a legitimate concern, and the solution proposed in Equation (1) is both elegant and easy to implement in practice. The proposed NAMMD method appears to consistently achieve higher testing
The main concern with this paper is that its technical innovation and motivation appear to be unrelated. From Figures 1(a) and 1(b), I can indeed see that MMD is not ideal when a and b have the same MMD value but their kernel mean embeddings have different RKHS norms. However, it is unclear why this becomes an issue specifically in the context of DCT. Without establishing this connection, the motivation remains weak. If DCT is susceptible to this particular issue of MMD, the authors should clari
1, The experiments are broad and cover both synthetic and real-world datasets. 2, The paper identifies an important but underexplored problem: distribution closeness testing (DCT)
1, the paper proposes a normalizing approach, but the paper does not rigorously prove that this scaling yields an optimal variance normalization or minimizes any formal criterion (e.g., unbiasedness or asymptotic efficiency). Thus, NAMMD’s normalization remains heuristic rather than theoretically grounded. 2, I find it somewhat concerning that the theory does not unify DCT and TST despite superficial similarity. The paper reverts to permutation calibration, admitting that the asymptotic distrib
- The paper introduces a conceptually simple yet elegant modification of MMD that aims to adapt to distributional scales, a long-standing issue in kernel-based two-sample testing. - The theoretical analysis (especially Theorem 9) provides partial intuition for when NAMMD may outperform MMD. - The empirical results indicate improved detection power in some settings, particularly for scale-shifted distributions. - The work situates itself in the broader line of kernel-based hypothesis testing a
- The motivation for the specific normalization by the sum of RKHS norms remains unclear. It is not obvious why this normalization is preferable to other kernel-based measures such as Kernel Canonical Correlation Analysis (KCCA; Akaho, 2001) or Hilbert–Schmidt Independence Criterion (HSIC; Gretton et al., 2005), both of which normalize by feature-space variance or covariance to account for scale differences. Clarifying the conceptual distinction between NAMMD and these established approaches wou
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Fault Detection and Control Systems
