MMIL: A novel algorithm for disease associated cell type discovery
Erin Craig, Timothy Keyes, Jolanda Sarno, Maxim Zaslavsky, Garry, Nolan, Kara Davis, Trevor Hastie, Robert Tibshirani

TL;DR
This paper introduces MMIL, a new expectation maximization algorithm that trains cell classifiers using patient-level labels, effectively identifying disease-associated cells in single-cell datasets without individual labels.
Contribution
The paper presents MMIL, a novel mixture modeling approach for multiple instance learning that enables disease cell type discovery from unlabeled single-cell data, adaptable to various classifiers.
Findings
Accurately identifies cancer cells in AML and ALL datasets.
Generalizes across tissues and treatment timepoints.
Incorporates known cell labels into training for improved performance.
Abstract
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease. To address this, we introduce Mixture Modeling for Multiple Instance Learning (MMIL), an expectation maximization method that enables the training and calibration of cell-level classifiers using patient-level labels. Our approach can be used to train e.g. lasso logistic regression models, gradient boosted trees, and neural networks. When applied to clinically-annotated, primary patient samples in Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL), our method accurately identifies cancer cells, generalizes across tissues and treatment timepoints, and selects biologically relevant features. In addition, MMIL is capable of incorporating cell labels into model training when they are known, providing a powerful framework for leveraging both labeled and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
