Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection

Abhinaba Basu; Pavan Chakraborty

arXiv:2603.12349·cs.LG·March 16, 2026

Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection

Abhinaba Basu, Pavan Chakraborty

PDF

Open Access

TL;DR

This paper introduces BSDS, a formally verified, budget-aware evaluation metric for AI-guided scientific discovery, demonstrating its application in drug candidate selection and showing LLMs add no significant value over existing classifiers.

Contribution

The paper presents BSDS, a novel, formally verified metric for evaluating AI selection strategies under budget constraints, with a comprehensive case study in drug discovery.

Findings

01

The RF-based Greedy-ML proposer outperforms all LLM configurations.

02

LLMs do not provide marginal value over existing classifiers in this setting.

03

The framework generalizes across multiple benchmarks and parameter settings.

Abstract

Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level. Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that no proposer can inflate by performing well at a cherry-picked budget. As a case study, we apply BSDS/DQS to: do LLMs add marginal value to an existing ML…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Scientific Computing and Data Management