TL;DR
This paper introduces a donor-aware benchmarking framework for classifying IBD from scRNA-seq data, emphasizing the importance of compartment-aware features for improved accuracy and interpretability.
Contribution
It evaluates three feature representations across two IBD cohorts using donor-aware cross-validation, highlighting the effectiveness of compartment-stratified features and proposing a comprehensive benchmark.
Findings
Compartment-stratified CLR composition achieves high AUROC (0.956) in SCP259.
GatedStructuralCFN embeddings outperform linear models in the colon region of the Kong cohort.
Cross-dataset transfer shows moderate success with AUROC 0.833 when transferring from Crohn's to UC.
Abstract
Donor-level disease classification from single-cell RNA sequencing (scRNA-seq) requires strict donor-aware cross-validation: naive pipelines that split cells randomly conflate training and test donors, inflating reported performance through pseudoreplication. We present a donor-aware benchmark evaluating three feature representations across two independent IBD cohorts: centered log-ratio (CLR) transformed cell-type composition, GatedStructuralCFN dependency embeddings, and scVI variational autoencoder latent embeddings. The cohorts are the SCP259 ulcerative colitis atlas (UC vs. Healthy, n=30 donors, 51 cell types) and the Kong 2023 Crohn's disease atlas (CD vs. Healthy, n=71 donors, 55-68 cell types across three intestinal regions). Compartment-stratified CLR composition achieves AUROC 0.956 +/- 0.061 on SCP259; GatedStructuralCFN on the same features achieves 0.978 +/- 0.050. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
