Nested Atoms Model with Application to Clustering Big Population-Scale Single-Cell Data
Arhit Chakrabarti, Yang Ni, Yuchao Jiang, Bani K. Mallick

TL;DR
This paper introduces the Nested Atoms Model (NAM), a Bayesian nonparametric method for hierarchical clustering of large-scale single-cell data, effectively capturing nested heterogeneity at both individual and cell levels.
Contribution
The paper presents NAM, a novel scalable Bayesian approach that jointly clusters individuals and cells, incorporating group-level variables for high-dimensional single-cell data.
Findings
NAM outperforms existing methods ignoring group variables in simulations.
NAM identifies biologically meaningful clusters aligned with immune cell types.
Applied to real data, NAM reveals genetically similar groups with homogeneous cell profiles.
Abstract
We consider the problem of clustering nested or hierarchical data, where observations are grouped and there are both group-level and observation-level variables. In our motivating OneK1K dataset, observations consist of single-cell RNA-sequencing (scRNA-seq) data from 982 individuals (groups), totaling 1.27 million cells (observations), along with individual-specific genotype data. This type of data would enable the identification of cell types and the investigation of how genetic variations among individuals influence differences in cell-type profiles. Our goal, therefore, is to jointly cluster cells and individuals to capture the heterogeneity across both levels using cell-specific gene expressions as well as individual-specific genotypes. However, existing grouped clustering methods do not incorporate group-level variables, thereby limiting their ability to capture the heterogeneity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
