# scIRT: Imputation and Dimensionality Reduction for Single-Cell RNA-Seq Data by Combining NMF with SMOTE

**Authors:** Yunwen Mou, Shuchao Li, Guoli Ji

PMC · DOI: 10.3390/ijms27031173 · International Journal of Molecular Sciences · 2026-01-23

## TL;DR

scIRT is a new method that improves single-cell RNA-seq data analysis by filling in missing data and reducing dimensions at the same time.

## Contribution

The novel contribution is combining SMOTE and NMF in an iterative pipeline to impute missing data and achieve dimensionality reduction simultaneously.

## Key findings

- scIRT outperforms existing methods in imputing missing data in scRNA-seq datasets.
- The low-dimensional representation matrix generated by scIRT improves clustering performance.
- The method is robust and effective for preprocessing scRNA-seq data for downstream analyses.

## Abstract

The establishment and development of single-cell RNA-sequencing (scRNA-seq) technology has accelerated the analysis of cell genome characteristics down to the single-cell level. Despite the rapid development of scRNA-seq technology, we cannot obtain a complete gene expression matrix in the biological experiments, and the scRNA-seq data obtained from experiments also have a high dropout rate. Unfortunately, gene expression analysis and clustering tools require a complete matrix of gene expression values for classification or clustering calculations. Most imputation methods focus on the impact of the imputed high-dimensional expression matrix on clustering and cannot obtain the low-dimensional representation matrix, which may have an even better guiding effect on clustering. To this end, we designed an iterative imputation pipeline called scIRT to estimate dropout events for scRNA-seq and achieve dimensionality reduction simultaneously by combining the synthetic minority over-sampling technique (SMOTE) and non-negative matrix factorization (NMF). The adaptation of SMOTE effectively imputes missing data, while NMF performs dimensionality reduction and feature extraction on high-dimensional data. Using several scRNA-seq datasets, we demonstrated that this new approach achieved better and more robust performance than the existing approaches. We also compared the different effects of the imputed matrix and the low-dimensional representation matrix on clustering. ScIRT is a tool that can be used to preprocess scRNA-seq data. It can effectively recover missing data from scRNA-seq to facilitate downstream analyses such as cell type clustering and visualization.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12897353/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12897353/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12897353/full.md

---
Source: https://tomesphere.com/paper/PMC12897353