Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control
Ihor Kendiukhov

TL;DR
This paper performs exhaustive circuit mapping of a single-cell foundation model, revealing significant redundancy, a heavy-tailed hub architecture, and layer-dependent control of cell differentiation, challenging prior biased interpretability methods.
Contribution
It introduces comprehensive circuit tracing and causal trajectory steering in Geneformer, uncovering systematic biases and layer-specific control mechanisms in the model.
Findings
Revealed 27-fold increase in downstream features through exhaustive tracing.
Identified 1.8% of features as disproportionate hubs, many unannotated.
Confirmed layer-dependent causal influence on cell differentiation.
Abstract
Mechanistic interpretability of biological foundation models has relied on selective feature sampling, pairwise interaction testing, and observational trajectory analysis. Each of these can introduce systematic bias. Here we present three experiments that address these limitations through exhaustive circuit tracing, higher order combinatorial ablation, and causal trajectory steering in Geneformer, a transformer based single cell foundation model. First, exhaustive tracing of all 4065 active sparse autoencoder features at layer 5 yields 1393850 significant downstream edges, a 27 fold expansion over selective sampling. This reveals a heavy tailed hub distribution in which 1.8 percent of features account for disproportionate connectivity and 40 percent of the top 20 hubs lack biological annotation. These results indicate systematic annotation bias in prior selective analyses. Second, three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis · Genomics and Chromatin Dynamics · Single-cell and spatial transcriptomics
