scReader: Prompting Large Language Models to Interpret scRNA-seq Data
Cong Li, Qingqing Long, Yuanchun Zhou, Meng Xiao

TL;DR
This paper introduces scReader, a hybrid method combining large language models with domain-specific models to interpret single-cell RNA sequencing data across species, improving annotation accuracy and interoperability.
Contribution
The study presents a novel hybrid approach that leverages LLMs and gene-level representations for cross-species single-cell data interpretation, addressing data scale disparities.
Findings
Improved cell annotation accuracy over existing methods
Enhanced cross-species data interoperability
Effective visualization of single-cell gene expression data
Abstract
Large language models (LLMs) have demonstrated remarkable advancements, primarily due to their capabilities in modeling the hidden relationships within text sequences. This innovation presents a unique opportunity in the field of life sciences, where vast collections of single-cell omics data from multiple species provide a foundation for training foundational models. However, the challenge lies in the disparity of data scales across different species, hindering the development of a comprehensive model for interpreting genetic data across diverse organisms. In this study, we propose an innovative hybrid approach that integrates the general knowledge capabilities of LLMs with domain-specific representation models for single-cell omics data interpretation. We begin by focusing on genes as the fundamental unit of representation. Gene representations are initialized using functional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer-related molecular mechanisms research · MicroRNA in disease regulation
