scMILD: Single-cell multiple instance learning for sample classification and associated subpopulation discovery
Kyeonghun Jeong, Jinwook Choi, Kwangsoo Kim

TL;DR
scMILD is a machine learning framework that identifies cell subpopulations linked to diseases using only sample-level labels.
Contribution
scMILD introduces a weakly supervised multiple instance learning framework for subpopulation discovery without requiring cell-level labels.
Findings
scMILD successfully identifies condition-associated cells in single-cell datasets using only sample-level labels.
The method reveals monocyte state transitions in COVID-19 progression and distinguishes shared and disease-specific signatures in Lupus and COVID-19.
Validation on diverse disease datasets confirms scMILD's ability to retrieve known biological signatures.
Abstract
Linking cellular states to clinical phenotypes is a major challenge in single-cell analysis. Here, we present single-cell multiple instance learning for sample classification and associated subpopulation discovery (scMILD), a weakly supervised multiple instance learning framework that robustly identifies condition-associated cells using only sample-level labels. After systematically validating scMILD’s accuracy through controlled simulations, we applied it to diverse disease datasets, confirming its ability to retrieve known biological signatures. Building on this, our sample-informed analysis of scMILD-identified monocytes in COVID-19 revealed a temporal transition from an early antiviral to a late stress-response state. Furthermore, in a cross-disease application, a model trained on COVID-19 successfully stratified patients with Lupus and distinguished shared inflammatory states from…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning
