# Multi-omics single-cell data alignment and integration with enhanced contrastive learning and differential attention mechanism

**Authors:** Tianjiao Zhang, Zhongqian Zhao, Hongfei Zhang, Zhenao Wu, Fang Wang, Guohua Wang

PMC · DOI: 10.1093/bioinformatics/btaf443 · Bioinformatics · 2025-08-07

## TL;DR

This paper introduces scECDA, a new method for integrating multi-omics single-cell data to improve cell type identification accuracy.

## Contribution

The novel scECDA method uses enhanced contrastive learning and differential attention to integrate multi-omics data more effectively.

## Key findings

- scECDA outperformed eight state-of-the-art methods in cell clustering accuracy.
- The method effectively reduces noise and identifies key biological markers.
- It adapts to different single-cell omics data platforms like 10X Multiome and CITE-seq.

## Abstract

Identifying cell types that constitute complex tissue components using single-cell
sequencing data is a critical issue in the field of biology. With the continuous
advancement of sequencing technologies, the recognition of cell types has evolved from
analyzing single-omics scRNA-seq data to integrating multi-omics single-cell data.
However, existing methods for integrative analysis of high-dimensional multi-omics
single-cell sequencing data have several limitations, including reliance on specific
distribution assumptions of the data, sensitivity to noise, and clustering accuracy
constrained by independent clustering methods. These issues have restricted improvements
in the accuracy of cell type identification and hindered the application of such methods
to large-scale datasets for cell type recognition. To address these challenges, we
propose a novel method for aligning and integrating single-cell multi-omics
data—scECDA.

The scECDA employs independently designed autoencoders that can autonomously learn the
feature distributions of each omics dataset. By incorporating enhanced contrastive
learning and differential attention mechanisms, the scECDA effectively reduces the
interference of noise during data integration. The model design exhibits high
flexibility, enabling adaptation to single-cell omics data generated by different
technological platforms. It directly outputs integrated latent features and end-to-end
cell clustering results. Through the analysis of the distribution of latent features,
the scECDA can effectively identify key biological markers and precisely distinguish
cell subtypes, recover cluster-specific motif and infer trajectory. The scECDA was
applied to eight paired single-cell multi-omics datasets, covering data generated by 10X
Multiome, CITE-seq, and TEA-seq technologies. Compared to eight state-of-the-art
methods, scECDA demonstrated higher accuracy in cell clustering.

The scECDA code is freely available at https://github.com/SuperheroBetter/scECDA

## Full-text entities

- **Genes:** GLI3 (GLI family zinc finger 3) [NCBI Gene 2737] {aka ACLS, GCPS, GLI3-190, GLI3FL, PAP-A, PAPA}, SNAR-E (small NF90 (ILF3) associated RNA E) [NCBI Gene 100170220], CD14 (CD14 molecule) [NCBI Gene 929], MEF2C (myocyte enhancer factor 2C) [NCBI Gene 4208] {aka C5DELq14.3, DEL5q14.3, NEDHSIL}, SPIC (Spi-C transcription factor) [NCBI Gene 121599] {aka SPI-C}, TCF7 (transcription factor 7) [NCBI Gene 6932] {aka TCF-1}, CD8B (CD8 subunit beta) [NCBI Gene 926] {aka CD8B1, CD8beta, LEU2, LY3, LYT3, Ly-3}, Cd34 (CD34 antigen) [NCBI Gene 12490], GATA3 (GATA binding protein 3) [NCBI Gene 2625] {aka HDR, HDRS}, FCGR3A (Fc gamma receptor IIIa) [NCBI Gene 2214] {aka CD16-II, CD16A, FCG3, FCGR3, FCRIIIA, FcGRIIIA}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, MAFK (MAF bZIP transcription factor K) [NCBI Gene 7975] {aka NFE2U, P18}, NR2F1 (nuclear receptor subfamily 2 group F member 1) [NCBI Gene 7025] {aka BBOAS, BBSOAS, COUP-TFI, COUPTF1, EAR-3, EAR3}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}
- **Diseases:** Infectious diseases (MESH:D003141), Autoimmune thyroid disease (MESH:D013967), autoimmune diseases (MESH:D001327), Graft-versus-host disease (MESH:D006086), inflammatory (MESH:D007249), T-cell leukemia virus 1 infection (MESH:D015458), Viral myocarditis (MESH:D014777), Type I diabetes mellitus (MESH:D003922), Coronavirus disease (MESH:D018352)
- **Chemicals:** scADT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12543095/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12543095/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12543095/full.md

---
Source: https://tomesphere.com/paper/PMC12543095