Statistical File Matching of Flow Cytometry Data
Gyemin Lee (1), William Finn (2), Clayton Scott (1,3) ((1), Department of Electrical Engineering, Computer Science, University of, Michigan, (2) Department of Pathology, University of Michigan (3) Department, of Statistics, University of Michigan)

TL;DR
This paper proposes a novel method for imputing high-dimensional flow cytometry data by combining clustering with a restricted nearest neighbor approach, addressing issues of spurious subpopulations and missing data.
Contribution
It introduces a mixture model-based EM algorithm with domain-informed initialization for clustering with missing data in flow cytometry analysis.
Findings
Effective imputation of high-dimensional data demonstrated on real datasets
Reduces spurious subpopulations caused by naive nearest neighbor methods
Improves clustering accuracy in multidimensional flow cytometry analysis
Abstract
Flow cytometry is a technology that rapidly measures antigen-based markers associated to cells in a cell population. Although analysis of flow cytometry data has traditionally considered one or two markers at a time, there has been increasing interest in multidimensional analysis. However, flow cytometers are limited in the number of markers they can jointly observe, which is typically a fraction of the number of markers of interest. For this reason, practitioners often perform multiple assays based on different, overlapping combinations of markers. In this paper, we address the challenge of imputing the high dimensional jointly distributed values of marker attributes based on overlapping marginal observations. We show that simple nearest neighbor based imputation can lead to spurious subpopulations in the imputed data, and introduce an alternative approach based on nearest neighbor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Bayesian Methods and Mixture Models
