scShapeBench: Discovering geometry from high dimensional scRNAseq data
Andrew J Steindl, Jo\~ao Felipe Rocha, Brian Tshilengi Di Bassinga, Zachary Warren, Matthew Scicluna, C\'esar Miguel Valdez C\'ordova, Shabarni Gupta, Leire Torices, Daniel Neumann, Timothy J. Mann, Ihuan Gunawan, Dhananjay Bhaskar, John G Lock, Christine L Chaffer, Guy Wolf

TL;DR
This paper introduces scShapeBench, a benchmark dataset for shape detection in high-dimensional single-cell RNA sequencing data, and proposes scReebTower, a diffusion geometry-based method that outperforms existing tools in identifying dataset topologies.
Contribution
The paper presents a new benchmark dataset, evaluation metrics, and a baseline method for automated shape detection in single-cell data analysis.
Findings
scReebTower outperforms PAGA and Mapper in shape detection tasks
Synthetic datasets are sampled from ground-truth skeleton graphs with controlled variance
Expert annotations categorize real datasets into four shape types
Abstract
High-dimensional point cloud data arise across many scientific domains, especially single-cell biology. The shapes or topologies of these datasets determine the types of information that can be extracted. For example, clustered data supports cell-type identification, trajectory structures support transition analysis, and archetypal structures capture continua of cellular behaviors. Existing analysis pipelines often assume a specific shape. The standard Seurat pipeline combines UMAP visualization with Louvain clustering and therefore assumes clustered data, while tools such as Monocle and SPADE assume tree-like structures, and flow-based models such as MIOFlow and Conditional Flow Matching target trajectories. Choosing which pipeline to apply is therefore often left to bioinformaticians who visually inspect datasets before selecting an analysis strategy. With the rise of agentic AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
