Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
Abhinaba Basu, Pavan Chakraborty

TL;DR
This paper introduces BSDS, a formally verified, budget-aware evaluation metric for AI-guided scientific discovery, demonstrating its application in drug candidate selection and showing LLMs add no significant value over existing classifiers.
Contribution
The paper presents BSDS, a novel, formally verified metric for evaluating AI selection strategies under budget constraints, with a comprehensive case study in drug discovery.
Findings
The RF-based Greedy-ML proposer outperforms all LLM configurations.
LLMs do not provide marginal value over existing classifiers in this setting.
The framework generalizes across multiple benchmarks and parameter settings.
Abstract
Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level. Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that no proposer can inflate by performing well at a cherry-picked budget. As a case study, we apply BSDS/DQS to: do LLMs add marginal value to an existing ML…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Scientific Computing and Data Management
