Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking
Justin Sybrandt, Michael Shtutman, Ilya Safro

TL;DR
This paper introduces a scalable numerical evaluation framework for hypothesis generation systems, validated through real-world experiments that led to a novel scientific discovery about HIV-related neurodegeneration.
Contribution
It presents a new validation method for hypothesis generation systems using thousands of hypotheses and novel metrics, aligning automated validation with research goals.
Findings
Validation method correlates with real-world research outcomes
Discovered a new link between HAND and DDX3 through automated hypothesis ranking
Framework enables scalable, objective evaluation of hypothesis generation systems
Abstract
The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Scientific Computing and Data Management
