# Securing diagonal integration of multimodal single-cell data against ambiguous mapping

**Authors:** Han Zhou, Kai Cao, Yang Young Lu

PMC · DOI: 10.1093/bioinformatics/btaf345 · 2025-06-14

## TL;DR

This paper introduces SONATA, a new tool to detect and prevent misleading integrations in multimodal single-cell data analysis.

## Contribution

SONATA is a novel diagnostic method that identifies ambiguous mappings in diagonal integration of multimodal single-cell data.

## Key findings

- Artificial integrations are widespread but overlooked in diagonal data integration methods.
- SONATA effectively detects and safeguards against misleading integrations in multimodal datasets.
- SONATA is compatible with existing integration pipelines as an add-on diagnostic tool.

## Abstract

Recent advances in single-cell multimodal omics technologies enable the exploration of cellular systems at unprecedented resolution, leading to the rapid generation of multimodal datasets that require sophisticated integration methods. Diagonal integration has emerged as a flexible solution for integrating heterogeneous single-cell data without relying on shared cells or features. However, the absence of anchoring elements introduces the risk of artificial integrations, where cells across modalities are incorrectly aligned due to ambiguous mapping.

To address this challenge, we propose SONATA (Securing diagOnal iNtegrATion against Ambiguous) mapping, a novel diagnostic method designed to detect potential artificial integrations resulting from ambiguous mappings in diagonal data integration. SONATA identifies ambiguous alignments by quantifying cell–cell ambiguity within the data manifold, ensuring that biologically meaningful integrations are distinguished from spurious ones. It is worth noting that SONATA is not designed to replace any existing pipelines for diagonal data integration; instead, SONATA works simply as an add-on to an existing pipeline for achieving more reliable integration. Through a comprehensive evaluation on both simulated and real multimodal single-cell datasets, we observe that artificial integrations in diagonal data integration are widespread yet surprisingly overlooked, occurring across all mainstream diagonal integration methods. We demonstrate SONATA’s ability to safeguard against misleading integrations and provide actionable insights into potential integration failures across mainstream methods. Our approach offers a robust framework for ensuring the reliability and interpretability of multimodal single-cell data integration.

The source code is available at (https://github.com/batmen-lab/SONATA).

## Full-text entities

- **Genes:** GEM (GTP binding protein overexpressed in skeletal muscle) [NCBI Gene 2669] {aka KIR}, Gem (GTP binding protein overexpressed in skeletal muscle) [NCBI Gene 297902], SNAR-E (small NF90 (ILF3) associated RNA E) [NCBI Gene 100170220]
- **Diseases:** sc (MESH:C535687)
- **Chemicals:** sc-NMT (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** H1 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_HA53), BJ — Homo sapiens (Human), Telomerase immortalized cell line (CVCL_6573), GM12878 — Homo sapiens (Human), Transformed cell line (CVCL_7526), GM — Capra hircus (Goat), Goat melanoma, Cancer cell line (CVCL_C7TS), K562 — Homo sapiens (Human), Blast phase chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_0004)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12205172/full.md

---
Source: https://tomesphere.com/paper/PMC12205172