Zipper: Addressing degeneracy in algorithm-agnostic inference
Geng Chen, Yinxu Jia, Guanghui Wang, Changliang Zou

TL;DR
The paper introduces Zipper, a novel method that addresses degeneracy in algorithm-agnostic goodness-of-fit tests by overlapping data splits, ensuring valid size control and improved power.
Contribution
Zipper provides a simple, effective device for non-degenerate, asymptotically normal test statistics in model-agnostic goodness-of-fit testing, enhancing power and validity.
Findings
Zipper guarantees asymptotic normality under the null hypothesis.
The method maintains valid size control across various settings.
Finite-sample experiments show strong performance of Zipper.
Abstract
The widespread use of black box prediction methods has sparked an increasing interest in algorithm/model-agnostic approaches for quantifying goodness-of-fit, with direct ties to specification testing, model selection and variable importance assessment. A commonly used framework involves defining a predictiveness criterion, applying a cross-fitting procedure to estimate the predictiveness, and utilizing the difference in estimated predictiveness between two models as the test statistic. However, even after standardization, the test statistic typically fails to converge to a non-degenerate distribution under the null hypothesis of equal goodness, leading to what is known as the degeneracy issue. To addresses this degeneracy issue, we present a simple yet effective device, Zipper. It draws inspiration from the strategy of additional splitting of testing data, but encourages an overlap…
Peer Reviews
Decision·NeurIPS 2024 spotlight
- broadly applicable scenario (model/algorithm agnostic inference) - solves a known issue of a popular test statistic under the null hypothesis of 0 performance difference - does so with minimal intervention to model fitting / testing (just split your data in a certain way, evaluate on carefully defined subsets, and combine) - offers guidance on a few hypeparameters - provides results including on estimating the variance of the test statistic. - suggests fruitful directions for future work.
Nothing major at the moment, may revise before discussion period.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
