Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data
Junhao Liu, Siwei Xu, Lei Zhang, Jing Zhang

TL;DR
This study benchmarks instruction-tuned large language models for automating cell type annotation in single-cell genomics, demonstrating their robustness and potential to streamline complex biological data analysis without extra fine-tuning.
Contribution
Introduces SOAR, a comprehensive benchmark evaluating LLMs for cell type annotation across multiple datasets and modalities, highlighting their capabilities in automating single-cell data interpretation.
Findings
LLMs can accurately classify cell types in single-cell RNA-seq data.
Chain-of-thought prompting enhances biological insight generation.
LLMs perform well across diverse datasets and species without fine-tuning.
Abstract
Over the past decade, the revolution in single-cell sequencing has enabled the simultaneous molecular profiling of various modalities across thousands of individual cells, allowing scientists to investigate the diverse functions of complex tissues and uncover underlying disease mechanisms. Among all the analytical steps, assigning individual cells to specific types is fundamental for understanding cellular heterogeneity. However, this process is usually labor-intensive and requires extensive expert knowledge. Recent advances in large language models (LLMs) have demonstrated their ability to efficiently process and synthesize vast corpora of text to automatically extract essential biological knowledge, such as marker genes, potentially promoting more efficient and automated cell type annotations. To thoroughly evaluate the capability of modern instruction-tuned LLMs in automating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Bioinformatics and Genomic Networks · Gene expression and cancer classification
