Zipper: Addressing degeneracy in algorithm-agnostic inference

Geng Chen; Yinxu Jia; Guanghui Wang; Changliang Zou

arXiv:2306.16852·stat.ME·June 30, 2023·NeurIPS·1 cites

Zipper: Addressing degeneracy in algorithm-agnostic inference

Geng Chen, Yinxu Jia, Guanghui Wang, Changliang Zou

PDF

Open Access 1 Video 1 Reviews

TL;DR

The paper introduces Zipper, a novel method that addresses degeneracy in algorithm-agnostic goodness-of-fit tests by overlapping data splits, ensuring valid size control and improved power.

Contribution

Zipper provides a simple, effective device for non-degenerate, asymptotically normal test statistics in model-agnostic goodness-of-fit testing, enhancing power and validity.

Findings

01

Zipper guarantees asymptotic normality under the null hypothesis.

02

The method maintains valid size control across various settings.

03

Finite-sample experiments show strong performance of Zipper.

Abstract

The widespread use of black box prediction methods has sparked an increasing interest in algorithm/model-agnostic approaches for quantifying goodness-of-fit, with direct ties to specification testing, model selection and variable importance assessment. A commonly used framework involves defining a predictiveness criterion, applying a cross-fitting procedure to estimate the predictiveness, and utilizing the difference in estimated predictiveness between two models as the test statistic. However, even after standardization, the test statistic typically fails to converge to a non-degenerate distribution under the null hypothesis of equal goodness, leading to what is known as the degeneracy issue. To addresses this degeneracy issue, we present a simple yet effective device, Zipper. It draws inspiration from the strategy of additional splitting of testing data, but encourages an overlap…

Peer Reviews

Decision·NeurIPS 2024 spotlight

Reviewer 01Rating 8Confidence 4

Strengths

- broadly applicable scenario (model/algorithm agnostic inference) - solves a known issue of a popular test statistic under the null hypothesis of 0 performance difference - does so with minimal intervention to model fitting / testing (just split your data in a certain way, evaluate on carefully defined subsets, and combine) - offers guidance on a few hypeparameters - provides results including on estimating the variance of the test statistic. - suggests fruitful directions for future work.

Weaknesses

Nothing major at the moment, may revise before discussion period.

Videos

Zipper: Addressing Degeneracy in Algorithm-Agnostic Inference· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms