How Much Data is Enough? The Zeta Law of Discoverability in Biomedical Data, featuring the enigmatic Riemann zeta function

Paul M. Thompson

arXiv:2604.17581·cs.LG·April 21, 2026

How Much Data is Enough? The Zeta Law of Discoverability in Biomedical Data, featuring the enigmatic Riemann zeta function

Paul M. Thompson

PDF

TL;DR

This paper introduces a spectral scaling-law framework based on the Riemann zeta function to predict data sufficiency and performance improvements in biomedical data analysis and AI models.

Contribution

It proposes a novel theoretical model linking spectral properties of data to performance scaling, guiding data collection and model development strategies.

Findings

01

Spectral decay and signal alignment follow a zeta-like power-law scaling.

02

Representation learning enhances sample efficiency by concentrating signals in stable spectral modes.

03

The framework predicts when simpler or more complex models outperform based on data size.

Abstract

How much data is enough to make a scientific discovery? As biomedical datasets scale to millions of samples and AI models grow in capacity, progress increasingly depends on predicting when additional data will substantially improve performance. In practice, model development often relies on empirical scaling curves measured across architectures, modalities, and dataset sizes, with limited theoretical guidance on when performance should improve, saturate, or exhibit cross-over behavior. We propose a scaling-law framework for cross-modal discoverability based on spectral structure of data covariance operators, task-aligned signal projections, and learned representations. Many performance metrics, including AUC, can be expressed in terms of cumulative signal-to-noise energy accumulated across identifiable spectral modes of an encoder and cross-modal operator. Under mild assumptions, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.