# Coralysis enables sensitive identification of imbalanced cell types and states in single-cell data via multi-level integration

**Authors:** António G G Sousa, Johannes Smolander, Sini Junttila, Laura L Elo

PMC · DOI: 10.1093/nar/gkaf1128 · Nucleic Acids Research · 2025-11-13

## TL;DR

Coralysis is a new tool that improves the analysis of single-cell data by better integrating datasets and identifying rare or imbalanced cell types and states.

## Contribution

Coralysis introduces a novel integration algorithm and reference-mapping for accurate annotation and cell-state identification in single-cell data.

## Key findings

- Coralysis outperforms existing methods in integrating datasets with imbalanced or missing cell types.
- It provides cell-specific probability scores for identifying transient and stable cell states.
- The tool works robustly across transcriptomic and proteomic single-cell data types.

## Abstract

Complex single-cell analyses now routinely integrate multiple datasets, followed by cell-type annotation and differential expression analysis. Current state-of-the-art integration methods often struggle with imbalanced cell types across datasets particularly when highly similar but distinct cell types are not present in all datasets. Inaccurate integration leads to incorrect annotations, affecting downstream analyses such as differential expression. To streamline single-cell data analysis, we introduce Coralysis, an all-in-one package featuring a sensitive integration algorithm, reference-mapping for accurate automatic annotation, and fine-grained cell-state identification. We demonstrate that Coralysis shows consistently high performance across diverse integration tasks, outperforming state-of-the-art methods particularly in challenging settings when similar cell types are imbalanced or missing. It accurately predicts cell-type identities across various annotation scenarios. A key strength of Coralysis is its ability to provide cell-specific probability scores, enabling the identification of transient and stable cell-states, along with their differential expression patterns. Importantly, Coralysis performs robustly on different types of single-cell data from transcriptomics to proteomics. Overall, Coralysis includes all the main steps of single-cell data analysis; it preserves subtle biological variation by improving the integration and annotation of imbalanced cell types, and identifies fine-grained cell-states—enabling a faithful analysis of the cellular landscape in complex single-cell experiments.

Graphical Abstract

## Full-text entities

- **Genes:** Cd14 (CD14 molecule) [NCBI Gene 60350], AIF1 (allograft inflammatory factor 1) [NCBI Gene 199] {aka AIF-1, IBA1, IRT-1, IRT1}, Cd4 (CD4 antigen) [NCBI Gene 12504] {aka L3T4, Ly-4}, CSTA (cystatin A) [NCBI Gene 1475] {aka AREI, PSS4, STF1, STFA}, Fcgr4 (Fc receptor, IgG, low affinity IV) [NCBI Gene 246256] {aka 4833442P21Rik, CD16-2, FcgRIV, FcgammaRIV, Fcgr3a, Fcrl3}, CCL5 (C-C motif chemokine ligand 5) [NCBI Gene 6352] {aka D17S136E, RANTES, SCYA5, SIS-delta, SISd, TCP228}, CD34 (CD34 molecule) [NCBI Gene 947], CD1C (CD1c molecule) [NCBI Gene 911] {aka BDCA1, CD1, R7}, LTB (lymphotoxin beta) [NCBI Gene 4050] {aka TNFC, TNFSF3, TNLG1C, p33}, BLVRB (biliverdin reductase B) [NCBI Gene 645] {aka BVRB, FLR, HEL-S-10, SDR43U1}, TYMP (thymidine phosphorylase) [NCBI Gene 1890] {aka ECGF, ECGF1, MEDPS1, MNGIE, MTDPS1, PDECGF}, Cdk1 (cyclin dependent kinase 1) [NCBI Gene 12534] {aka Cdc2, Cdc2a, p34<CDC2>}, CMC1 (C-X9-C motif containing 1) [NCBI Gene 152100] {aka C3orf68, cmc1p}, S100A9 (S100 calcium binding protein A9) [NCBI Gene 6280] {aka 60B8AG, CAGB, CFAG, CGLB, L1AG, LIAG}, Ptma (prothymosin alpha) [NCBI Gene 19231] {aka Thym}, LILRA4 (leukocyte immunoglobulin like receptor A4) [NCBI Gene 23547] {aka CD85g, ILT7}, Cd14 (CD14 antigen) [NCBI Gene 12475], XCL2 (X-C motif chemokine ligand 2) [NCBI Gene 6846] {aka SCM-1b, SCM1B, SCYC2}, Ckb (creatine kinase, brain) [NCBI Gene 12709] {aka B-CK, Bck, CPK-B, Ck-3, Ck3, Ckbb}, GNLY (granulysin) [NCBI Gene 10578] {aka D2S69E, LAG-2, LAG2, NKG5, TLA519}, KRT16 (keratin 16) [NCBI Gene 3868] {aka CK16, FNEPPK, K16, K1CP, KRT16A, NEPPK}, LEF1 (lymphoid enhancer binding factor 1) [NCBI Gene 51176] {aka ECTD1, ECTD17, LEF-1, TCF10, TCF1ALPHA, TCF7L3}, CST7 (cystatin F) [NCBI Gene 8530] {aka CMAP}, PLA2G7 (phospholipase A2 group VII) [NCBI Gene 7941] {aka LDL-PLA2, LP-PLA2, PAFAD, PAFAH}, LDLRAP1 (low density lipoprotein receptor adaptor protein 1) [NCBI Gene 26119] {aka ARH, ARH1, ARH2, FHCB1, FHCB2, FHCL4}, CST3 (cystatin C) [NCBI Gene 1471] {aka ADLDWA, ARMD11, HEL-S-2}, CD79A (CD79a molecule) [NCBI Gene 973] {aka IGA, IGAlpha, MB-1, MB1}, Igfbp5 (insulin-like growth factor binding protein 5) [NCBI Gene 16011] {aka IGFBP-5, IGFBP-5P}, FAM53B (family with sequence similarity 53 member B) [NCBI Gene 9679] {aka KIAA0140, bA12J10.2, smp}, FGFBP2 (fibroblast growth factor binding protein 2) [NCBI Gene 83888] {aka HBP17RP, KSP37}, CCL8 (C-C motif chemokine ligand 8) [NCBI Gene 6355] {aka HC14, MCP-2, MCP2, SCYA10, SCYA8}, Fcgr3a (Fc gamma receptor 3A) [NCBI Gene 304966] {aka CD16-2, Fcgr4}, Sfrp1 (secreted frizzled-related protein 1) [NCBI Gene 20377] {aka 2210415K03Rik, sFRP-1}, XBP1 (X-box binding protein 1) [NCBI Gene 7494] {aka TREB-5, TREB5, XBP-1, XBP2}, LYAR (Ly1 antibody reactive) [NCBI Gene 55646] {aka ZC2HC2, ZLYAR}, CD63 (CD63 molecule) [NCBI Gene 967] {aka AD1, HOP-26, ME491, MLA1, OMA81H, Pltgp40}, S100A8 (S100 calcium binding protein A8) [NCBI Gene 6279] {aka 60B8AG, CAGA, CFAG, CGLA, CP-10, L1Ag}, S100a10 (S100 calcium binding protein A10 (calpactin)) [NCBI Gene 20194] {aka 42C, CAL12, CLP11, Cal1l, p10, p11}, Cdkn3 (cyclin dependent kinase inhibitor 3) [NCBI Gene 72391] {aka 2410006H10Rik, KAP}, LST1 (leukocyte specific transcript 1) [NCBI Gene 7940] {aka B144, D6S49E, LST-1}, GZMB (granzyme B) [NCBI Gene 3002] {aka C11, CCPI, CGL-1, CGL1, CSP-B, CSPB}, Cd4 (Cd4 molecule) [NCBI Gene 24932] {aka W3/25, p55}, FCER1A (Fc epsilon receptor Ia) [NCBI Gene 2205] {aka FCE1A, FCERIA, FcERI}, IFNG (interferon gamma) [NCBI Gene 3458] {aka IFG, IFI, IMD69}, Tmem88 (transmembrane protein 88) [NCBI Gene 67020] {aka 2600017H02Rik}, IL7R (interleukin 7 receptor) [NCBI Gene 3575] {aka CD127, CDW127, IL-7R-alpha, IL-7Ralpha, IL7RA, IL7Ralpha}, IL32 (interleukin 32) [NCBI Gene 9235] {aka IL-32alpha, IL-32beta, IL-32delta, IL-32gamma, NK4, TAIF}, CD3D (CD3 delta subunit of T-cell receptor complex) [NCBI Gene 915] {aka CD3-DELTA, CD3DELTA, IMD19, T3D}, NPM1 (nucleophosmin 1) [NCBI Gene 4869] {aka B23, NPM}, IFNA1 (interferon alpha 1) [NCBI Gene 3439] {aka IFL, IFN, IFN-ALPHA, IFN-alphaD, IFNA13, IFNA@}, SPOCK2 (SPARC (osteonectin), cwcv and kazal like domains proteoglycan 2) [NCBI Gene 9806] {aka testican-2}, CTRL (chymotrypsin like) [NCBI Gene 1506] {aka CTRL1}, CD48 (CD48 molecule) [NCBI Gene 962] {aka BCM1, BLAST, BLAST1, MEM-102, SLAMF2, hCD48}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, KLRG1 (killer cell lectin like receptor G1) [NCBI Gene 10219] {aka 2F1, CLEC15A, MAFA, MAFA-2F1, MAFA-L, MAFA-LIKE}, LINC00926 (long intergenic non-protein coding RNA 926) [NCBI Gene 283663], MS4A1 (membrane spanning 4-domains A1) [NCBI Gene 931] {aka B1, Bp35, CD20, CVID5, FMC7, LEU-16}, CFD (complement factor D) [NCBI Gene 1675] {aka ADIPSIN, ADN, DF, PFD}, APMAP (adipocyte plasma membrane associated protein) [NCBI Gene 57136] {aka BSCv, C20orf3}, CTSW (cathepsin W) [NCBI Gene 1521] {aka LYPN}
- **Diseases:** Cancer (MESH:D009369), H1N1 influenza (MESH:D007251), UMAP (MESH:C567162), HIV-infected (MESH:D015658), infected (MESH:D007239), ICP (MESH:D003027)
- **Chemicals:** ADT (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** EN-1 — Bos taurus (Bovine), Spontaneously immortalized cell line (CVCL_L860), S21 — Mus musculus (Mouse), Transformed cell line (CVCL_K245)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12614221/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12614221/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12614221/full.md

---
Source: https://tomesphere.com/paper/PMC12614221