OmniCellTOSG: The First Cell Text-Omic Signaling Graphs Dataset for Graph Language Foundation Modeling
Heming Zhang, Tim Xu, Dekang Cao, Shunning Liang, Guntaas Shergill, Nicholas Hadas, Lars Schimmelpfennig, Levi Kaster, Di Huang, Guangfu Li, S. Peter Goedegebuure, David DeNardo, Li Ding, Ryan C. Fields, J Philip Miller, Pirooz Eghtesady, Carlos Cruchaga, William Buchser

TL;DR
This paper introduces OmniCellTOSG, a large-scale, multimodal graph dataset integrating biomedical text, signaling networks, and omic data, enabling advanced, interpretable analysis of single-cell RNA sequencing across diseases.
Contribution
It presents the novel Text-Omic Signaling Graph (TOSG) data structure and constructs the OmniCellTOSG dataset, along with a new graph language foundation model for comprehensive single-cell data analysis.
Findings
CellTOSG-FM outperforms existing models on multiple tasks.
Provides interpretable insights into disease mechanisms.
Enables integration of textual, signaling, and omic data.
Abstract
With the rapid growth of large-scale single-cell omic datasets, omic foundation models (FMs) have emerged as powerful tools for advancing research in life sciences and precision medicine. However, most existing omic FMs rely primarily on numerical transcriptomic data by sorting genes as sequences, while lacking explicit integration of biomedical prior knowledge and signaling interactions that are critical for scientific discovery. Here, we introduce the Text-Omic Signaling Graph (TOSG), a novel data structure that unifies human-interpretable biomedical textual knowledge, quantitative omic data, and signaling network information. Using this framework, we construct OmniCellTOSG, a large-scale resource comprising approximately half million meta-cell TOSGs derived from around 80 million single-cell and single-nucleus RNA-seq profiles across organs and diseases. We further develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks · Advanced Graph Neural Networks
