The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime

Jason Z Wang

arXiv:2604.12951·cs.LG·April 15, 2026

The Verification Tax: Fundamental Limits of AI Auditing in the Rare-Error Regime

Jason Z Wang

PDF

TL;DR

This paper establishes fundamental limits on AI model calibration verification, showing that as models improve, verifying calibration becomes inherently more difficult due to a statistical verification tax.

Contribution

It introduces a theoretical framework quantifying the fundamental difficulty of AI calibration verification and demonstrates its implications through extensive empirical validation.

Findings

01

Calibration verification is limited by a minimax rate of Theta((Lepsilon/m)^{1/3}).

02

Self-evaluation without labels provides no information about calibration.

03

Verification cost increases exponentially with pipeline depth.

Abstract

The most cited calibration result in deep learning -- post-temperature-scaling ECE of 0.012 on CIFAR-100 (Guo et al., 2017) -- is below the statistical noise floor. We prove this is not a failure of the experiment but a law: the minimax rate for estimating calibration error with model error rate epsilon is Theta((Lepsilon/m)^{1/3}), and no estimator can beat it. This "verification tax" implies that as AI models improve, verifying their calibration becomes fundamentally harder -- with the same exponent in opposite directions. We establish four results that contradict standard evaluation practice: (1) self-evaluation without labels provides exactly zero information about calibration, bounded by a constant independent of compute; (2) a sharp phase transition at mepsilon approx 1 below which miscalibration is undetectable; (3) active querying eliminates the Lipschitz constant, collapsing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.