Benchmarking AI scientists for omics data driven biological discovery

Erpai Luo; Jinmeng Jia; Yifan Xiong; Xiangyu Li; Xiaobo Guo; Baoqi Yu; Minsheng Hao; Lei Wei; Xuegong Zhang

arXiv:2505.08341·cs.AI·January 21, 2026

Benchmarking AI scientists for omics data driven biological discovery

Erpai Luo, Jinmeng Jia, Yifan Xiong, Xiangyu Li, Xiaobo Guo, Baoqi Yu, Minsheng Hao, Lei Wei, Xuegong Zhang

PDF

1 Repo

TL;DR

BAISBench is a new benchmark for evaluating AI scientists on real single-cell transcriptomic data, assessing their ability to perform cell type annotation and scientific discovery tasks, highlighting current capabilities and limitations.

Contribution

Introduction of BAISBench, a comprehensive benchmark for assessing AI scientists' performance on real biological data and discovery tasks in single-cell transcriptomics.

Findings

01

AI scientists show potential but do not match human experts.

02

Current AI systems outperform baseline models in some tasks.

03

Benchmark provides a realistic evaluation of AI in biological research.

Abstract

Recent advances in large language models have enabled the emergence of AI scientists that aim to autonomously analyze biological data and assist scientific discovery. Despite rapid progress, it remains unclear to what extent these systems can extract meaningful biological insights from real experimental data. Existing benchmarks either evaluate reasoning in the absence of data or focus on predefined analytical outputs, failing to reflect realistic, data-driven biological research. Here, we introduce BAISBench (Biological AI Scientist Benchmark), a benchmark for evaluating AI scientists on real single-cell transcriptomic datasets. BAISBench comprises two tasks: cell type annotation across 15 expert-labeled datasets, and scientific discovery through 193 multiple-choice questions derived from biological conclusions reported in 41 published single-cell studies. We evaluated several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eperluo/baisbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus