Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models
Charlotte Claye (MICS), Pierre Marschall, Wassila Ouerdane (MICS), C\'eline Hudelot (MICS), Julien Duquesne

TL;DR
This paper introduces a new interpretability framework for single-cell RNA-seq models that uses attribution methods and ontology-driven analysis to uncover biologically meaningful concepts, aiding hypothesis generation.
Contribution
It presents a novel concept-based interpretability framework combining attribution with biological pathway enrichment for single-cell RNA-seq models.
Findings
Concepts improve interpretability over individual neurons.
Framework successfully interprets models trained on immune cell datasets.
Interpretations align with known biological pathways.
Abstract
Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that sparse dictionary learning can extract concepts from deep learning models, with promising applications in biomedical imaging and protein models. However, interpreting biological concepts remains challenging, as biological sequences are not inherently human-interpretable. We introduce a novel concept-based interpretability framework for single-cell RNA-seq models with a focus on concept interpretation and evaluation. We propose an attribution method with counterfactual perturbations that identifies genes that influence concept activation, moving beyond correlational approaches like differential expression analysis. We then provide two complementary interpretation approaches: an expert-driven analysis…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The idea of introducing concept seems to make biological sense. 2. I think the association between the concepts and GO terms (Fig. 3B and Fig. 4C) are very interesting. It could be further explored in the future. 3. Although missing some details, the method descriptions are solid.
Maybe I have missed, but In section 4.2, it is unclear to me 1. the distribution of the attribution scores, why 0.05 is selected as the cutoff 2. what are the cells used for calculating the attribution scores Lack of a baseline method for the benchmark. How does it compare with direct low rank decomposition of the cell embedding matrix? In theory, you can treat the factors as the "concepts" and loadings as the "concept activations".
The paper provide a novel way of interpreting the embeddings provided by well known scRNA-seq models such as scVI and scGPT. They also provide an intuitive framework to analyze the obtained results by experts that may not be able to run the model themselves. I think this work can have important relevance on the biomedical community, as interpretability is a key factor while working with single-cell data.
I don't think this work provide enough sound advances in representation learning or machine learning for being suitable to ICLR. The method applies known SAE for dictionary learning based on known representation learning models from scRNA-seq data (scVI, scGPT). While the idea of interpreting the concepts inferred by the SAE is very interesting, they are more suitable to more specialized conferences/journals. In addition to that, I believe the results are somehow incomplete. While they provide
The core contribution (gene attribution method based on counterfactual perturbations) is a well-grounded methodological advancement. The empirical validation is sound: The authors provide evidence that concepts extracted via TopK SAEs are more interpretable than individual neurons from the original models scGPT and scVI. Also, the study evaluates the stability of learned concepts across different datasets, a known challenge for SAEs.
Context within the broader field of interpretable ML: The introduction frames the problem primarily as one of post-hoc explanation for "black box" models. However, it would benefit from acknowledging and contrasting its approach with the branch of causal representation learning, e.g., discrepancy-VAE (Zhang et al.), SENA (de la Fuente et al.), and GEARS (Roohani et al.). Discussing why a post-hoc concept extraction approach might be preferable or complementary (e.g., applicable to any pre-traine
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning
