# A clustering method for single-cell RNA sequencing data based on denoising and masking learning

**Authors:** Shuang Xu, Wen Yan, Bin Zhang, Hong Qi, Kai Wang

PMC · DOI: 10.3389/fbinf.2026.1758257 · Frontiers in Bioinformatics · 2026-03-03

## TL;DR

This paper introduces scDMAC, a new clustering method for single-cell RNA sequencing data that improves accuracy by reducing noise and learning gene correlations.

## Contribution

The novel scDMAC framework combines denoising and masking autoencoders to enhance clustering in sparse and noisy single-cell RNA sequencing data.

## Key findings

- scDMAC outperforms existing methods in clustering accuracy and stability on benchmark datasets.
- The method is robust to noise and sparsity in single-cell RNA sequencing data.
- Combining probabilistic denoising with masking-based learning improves biological representation extraction.

## Abstract

Single-cell RNA sequencing (scRNA-seq) enables high-throughput analysis of gene expression at single-cell resolution and plays a crucial role in studying cellular heterogeneity, tissue development, and disease mechanisms. However, scRNA-seq data are characterized by high dimensionality, sparsity, technical noise, and prevalent dropout events, which pose substantial challenges to conventional clustering approaches.

To address these challenges, we propose scDMAC, a novel clustering framework for single-cell RNA sequencing data based on denoising and masking learning. The method integrates a zero-inflated negative binomial (ZINB)-based denoising autoencoder with a masking autoencoder. First, the ZINB-based autoencoder models count distribution and dropout events to denoise gene expression data. Subsequently, a tailored masking strategy is applied to the denoised data to learn gene-wise correlations through reconstruction.

Extensive experiments conducted on multiple benchmark scRNA-seq datasets demonstrate that scDMAC achieves superior clustering accuracy and stability compared with state-of-the-art methods. The proposed framework consistently improves clustering performance across diverse datasets, highlighting its robustness to noise and sparsity.

By effectively combining probabilistic denoising with masking-based representation learning, scDMAC provides a powerful solution for addressing dropout and sparsity issues in scRNA-seq data. The improved clustering performance suggests that integrating distribution-aware denoising with feature reconstruction enhances the extraction of biologically meaningful representations, making scDMAC a promising tool for single-cell transcriptomic analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12993276/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12993276/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12993276/full.md

---
Source: https://tomesphere.com/paper/PMC12993276