# scMILD: Single-cell multiple instance learning for sample classification and associated subpopulation discovery

**Authors:** Kyeonghun Jeong, Jinwook Choi, Kwangsoo Kim

PMC · DOI: 10.1016/j.isci.2026.115284 · 2026-03-10

## TL;DR

scMILD is a machine learning framework that identifies cell subpopulations linked to diseases using only sample-level labels.

## Contribution

scMILD introduces a weakly supervised multiple instance learning framework for subpopulation discovery without requiring cell-level labels.

## Key findings

- scMILD successfully identifies condition-associated cells in single-cell datasets using only sample-level labels.
- The method reveals monocyte state transitions in COVID-19 progression and distinguishes shared and disease-specific signatures in Lupus and COVID-19.
- Validation on diverse disease datasets confirms scMILD's ability to retrieve known biological signatures.

## Abstract

Linking cellular states to clinical phenotypes is a major challenge in single-cell analysis. Here, we present single-cell multiple instance learning for sample classification and associated subpopulation discovery (scMILD), a weakly supervised multiple instance learning framework that robustly identifies condition-associated cells using only sample-level labels. After systematically validating scMILD’s accuracy through controlled simulations, we applied it to diverse disease datasets, confirming its ability to retrieve known biological signatures. Building on this, our sample-informed analysis of scMILD-identified monocytes in COVID-19 revealed a temporal transition from an early antiviral to a late stress-response state. Furthermore, in a cross-disease application, a model trained on COVID-19 successfully stratified patients with Lupus and distinguished shared inflammatory states from disease-specific ones. scMILD thus provides a validated and versatile strategy to dissect cellular heterogeneity, bridging single-cell observations with high-level phenotypes.

•scMILD identifies condition-associated cells using only sample-level labels•Dual-branch model with orthogonal projection loss improves cell separation•Sample-informed analysis reveals monocyte state transitions in COVID-19 progression•Cross-disease modeling reveals shared and specific signatures in SLE and COVID-19

scMILD identifies condition-associated cells using only sample-level labels

Dual-branch model with orthogonal projection loss improves cell separation

Sample-informed analysis reveals monocyte state transitions in COVID-19 progression

Cross-disease modeling reveals shared and specific signatures in SLE and COVID-19

Biocomputational method; Classification of bioinformatical subject; Transcriptomics

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096), Lupus (MONDO:0004670)

## Full-text entities

- **Diseases:** Lupus (MESH:D008180), inflammatory (MESH:D007249), COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13019583/full.md

---
Source: https://tomesphere.com/paper/PMC13019583