BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation

Fuyi Yang; Chenchen Ye; Mingyu Derek Ma; Yijia Xiao; Matthew Yang; Wei Wang

arXiv:2511.08866·cs.CL·November 13, 2025

BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation

Fuyi Yang, Chenchen Ye, Mingyu Derek Ma, Yijia Xiao, Matthew Yang, Wei Wang

PDF

Open Access

TL;DR

BioVerge introduces a standardized benchmark and an LLM-based agent framework for biomedical hypothesis generation, enabling exploration of complex scientific relationships with improved diversity, relevance, and novelty through structured data and self-evaluation.

Contribution

We present BioVerge, a comprehensive benchmark and BioVerge Agent framework that standardize biomedical hypothesis generation using LLMs, incorporating structured data, self-evaluation, and exploration strategies.

Findings

01

Different agent architectures affect exploration diversity and reasoning.

02

Structured and textual data sources each contribute unique insights.

03

Self-evaluation enhances hypothesis novelty and relevance.

Abstract

Hypothesis generation in biomedical research has traditionally centered on uncovering hidden relationships within vast scientific literature, often using methods like Literature-Based Discovery (LBD). Despite progress, current approaches typically depend on single data types or predefined extraction patterns, which restricts the discovery of novel and complex connections. Recent advances in Large Language Model (LLM) agents show significant potential, with capabilities in information retrieval, reasoning, and generation. However, their application to biomedical hypothesis generation has been limited by the absence of standardized datasets and execution environments. To address this, we introduce BioVerge, a comprehensive benchmark, and BioVerge Agent, an LLM-based agent framework, to create a standardized environment for exploring biomedical hypothesis generation at the frontier of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Artificial Intelligence in Healthcare and Education