Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

Douglas Jiang; Zilin Dai; Luxuan Zhang; Qiyi Yu; Haoqi Sun; Feng Tian

arXiv:2505.07896·q-bio.GN·May 14, 2025

Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability

Douglas Jiang, Zilin Dai, Luxuan Zhang, Qiyi Yu, Haoqi Sun, Feng Tian

PDF

TL;DR

This paper introduces a novel framework that combines gene annotations and large language models to generate biologically meaningful cell embeddings from single-cell RNA sequencing data, enhancing interpretability in cell analysis.

Contribution

It presents a new multimodal approach that integrates gene descriptions with language models to improve cell-type clustering and vulnerability analysis in single-cell transcriptomics.

Findings

01

Enhanced cell clustering accuracy

02

Improved interpretability of cell vulnerability

03

Effective integration of biological annotations with language models

Abstract

Understanding cell identity and function through single-cell level sequencing data remains a key challenge in computational biology. We present a novel framework that leverages gene-specific textual annotations from the NCBI Gene database to generate biologically contextualized cell embeddings. For each cell in a single-cell RNA sequencing (scRNA-seq) dataset, we rank genes by expression level, retrieve their NCBI Gene descriptions, and transform these descriptions into vector embedding representations using large language models (LLMs). The models used include OpenAI text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large (Jan 2024), as well as domain-specific models BioBERT and SciBERT. Embeddings are computed via an expression-weighted average across the top N most highly expressed genes in each cell, providing a compact, semantically rich representation. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.