Multi-Dimensional Spectral Geometry of Biological Knowledge in Single-Cell Transformer Representations

Ihor Kendiukhov

arXiv:2602.22247·q-bio.GN·February 27, 2026

Multi-Dimensional Spectral Geometry of Biological Knowledge in Single-Cell Transformer Representations

Ihor Kendiukhov

PDF

Open Access

TL;DR

This study reveals that single-cell transformer models encode biological knowledge in a structured, interpretable geometric space, reflecting cellular organization, protein interactions, and gene regulation.

Contribution

It systematically decodes the spectral geometry of transformer representations, uncovering biologically meaningful axes and structures within the model.

Findings

01

Genes organized by subcellular localization along spectral axes

02

Intermediate layers encode cellular compartments in sequence

03

Model distinguishes transcription factors from target genes with AUROC 0.744

Abstract

Single-cell foundation models such as scGPT learn high-dimensional gene representations, but what biological knowledge these representations encode remains unclear. We systematically decode the geometric structure of scGPT internal representations through 63 iterations of automated hypothesis screening (183 hypotheses tested), revealing that the model organizes genes into a structured biological coordinate system rather than an opaque feature space. The dominant spectral axis separates genes by subcellular localization, with secreted proteins at one pole and cytosolic proteins at the other. Intermediate transformer layers transiently encode mitochondrial and ER compartments in a sequence that mirrors the cellular secretory pathway. Orthogonal axes encode protein-protein interaction networks with graded fidelity to experimentally measured interaction strength (Spearman rho = 1.000…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Gene Regulatory Network Analysis