NOVA: Fundamental Limits of Knowledge Discovery Through AI

Salman Avestimehr; Ken Duffy; and Muriel M\'edard

arXiv:2605.15219·cs.AI·May 18, 2026

NOVA: Fundamental Limits of Knowledge Discovery Through AI

Salman Avestimehr, Ken Duffy, and Muriel M\'edard

PDF

TL;DR

The paper introduces the NOVA framework to analyze the fundamental limits of AI-driven knowledge discovery, highlighting conditions for success and failure modes in iterative self-improvement processes.

Contribution

It models the generate-verify-accumulate loop as an adaptive sampling process, providing theoretical insights into discovery costs, failure modes, and the role of human guidance.

Findings

01

Identification of failure modes: contamination, forgetting, exploration failure, acceptance failure.

02

Derivation of a power-law scaling law for discovery costs: R_cum(D)=Θ(c_gen D^α).

03

Analysis of false positives impact and the limitations of Good–Turing estimation.

Abstract

Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framework, which models the common ``generate, verify, accumulate, retrain'' loop as an adaptive sampling process over a knowledge space. We identify sufficient conditions under which accumulated genuine knowledge eventually covers a finite domain, and show how their violations produce distinct failure modes: contamination, forgetting, exploration failure, and acceptance failure. We then analyze imperfect verification and identify a contamination trap: as easy-to-find knowledge is exhausted, the model mass assigned to new valid artifacts shrinks, so even small false-positive rates can cause invalid artifacts to enter the knowledge base faster than genuine discoveries. We clarify that Good--Turing estimation is a local batch-diversity diagnostic, not an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.