NOVA: Fundamental Limits of Knowledge Discovery Through AI
Salman Avestimehr, Ken Duffy, and Muriel M\'edard

TL;DR
The paper introduces the NOVA framework to analyze the fundamental limits of AI-driven knowledge discovery, highlighting conditions for success and failure modes in iterative self-improvement processes.
Contribution
It models the generate-verify-accumulate loop as an adaptive sampling process, providing theoretical insights into discovery costs, failure modes, and the role of human guidance.
Findings
Identification of failure modes: contamination, forgetting, exploration failure, acceptance failure.
Derivation of a power-law scaling law for discovery costs: R_cum(D)=Θ(c_gen D^α).
Analysis of false positives impact and the limitations of Good–Turing estimation.
Abstract
Can AI systems discover genuinely new knowledge through iterative self improvement, and if so, at what cost? We introduce the NOVA framework, which models the common ``generate, verify, accumulate, retrain'' loop as an adaptive sampling process over a knowledge space. We identify sufficient conditions under which accumulated genuine knowledge eventually covers a finite domain, and show how their violations produce distinct failure modes: contamination, forgetting, exploration failure, and acceptance failure. We then analyze imperfect verification and identify a contamination trap: as easy-to-find knowledge is exhausted, the model mass assigned to new valid artifacts shrinks, so even small false-positive rates can cause invalid artifacts to enter the knowledge base faster than genuine discoveries. We clarify that Good--Turing estimation is a local batch-diversity diagnostic, not an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
