CellVerse: Do Large Language Models Really Understand Cell Biology?

Fan Zhang; Tianyu Liu; Zhihong Zhu; Hao Wu; Haixin Wang; Donghao Zhou; Yefeng Zheng; Kun Wang; Xian Wu; Pheng-Ann Heng

arXiv:2505.07865·q-bio.QM·May 14, 2025

CellVerse: Do Large Language Models Really Understand Cell Biology?

Fan Zhang, Tianyu Liu, Zhihong Zhu, Hao Wu, Haixin Wang, Donghao Zhou, Yefeng Zheng, Kun Wang, Xian Wu, Pheng-Ann Heng

PDF

TL;DR

CellVerse is a comprehensive benchmark evaluating large language models on single-cell biology tasks, revealing current models' limited understanding and highlighting the need for further development in applying LLMs to cell biology.

Contribution

The paper introduces CellVerse, the first large-scale benchmark for LLMs in single-cell biology, systematically assessing models across multiple tasks and revealing significant performance gaps.

Findings

01

Existing specialist models underperform across tasks.

02

Generalist LLMs show preliminary understanding but lack accuracy.

03

None of the models significantly outperform random guessing in drug response prediction.

Abstract

Recent studies have demonstrated the feasibility of modeling single-cell data as natural languages and the potential of leveraging powerful large language models (LLMs) for understanding cell biology. However, a comprehensive evaluation of LLMs' performance on language-driven single-cell analysis tasks still remains unexplored. Motivated by this challenge, we introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data and encompasses three hierarchical levels of single-cell analysis tasks: cell type annotation (cell-level), drug response prediction (drug-level), and perturbation analysis (gene-level). Going beyond this, we systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse. Remarkably, the experimental results reveal: (1) Existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · Linear Layer · Weight Decay · Adam · Multi-Head Attention