# Imputing missing values in single-cell RNA-sequencing data: a statistical and machine learning-based approach

**Authors:** A F M Shamsuzzaman, Sumanta Ray, Anirban Mukhopadhyay

PMC · DOI: 10.1093/bib/bbag072 · 2026-02-16

## TL;DR

This paper introduces a new method called scDDI to better detect and fill in missing gene expression data in single-cell RNA sequencing.

## Contribution

The novel scDDI method combines a Poisson–negative binomial mixture model with decision tree regression for improved dropout detection and imputation.

## Key findings

- scDDI outperforms existing methods in dropout detection and imputation on both simulated and real datasets.
- Improved imputation leads to better performance in downstream tasks like clustering and subpopulation identification.

## Abstract

Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to capture gene expression patterns within individual cells. However, due to the limited RNA content within cells, dropout events occur, resulting in a substantial number of zero counts in the single-cell expression matrix. To address this issue, we propose a novel method called single-cell dropout detection and imputation (scDDI). This method identifies dropout events using a Poisson–negative binomial mixture model and subsequently imputes the missing values using a decision tree regression model. We evaluate the performance of scDDI on both simulated and real scRNA-seq datasets, demonstrating its superiority over established single-cell imputation techniques. Notably, scDDI significantly improves dropout detection, leading to enhanced performance in various downstream analysis tasks like gene expression recovery, cell clustering, and cell subpopulation identification.

## Full-text entities

- **Genes:** ETNPPL (ethanolamine-phosphate phospho-lyase) [NCBI Gene 64850] {aka AGXT2L1}, LRRK2 (leucine rich repeat kinase 2) [NCBI Gene 120892] {aka AURA17, DARDARIN, PARK8, RIPK7, ROCO2}, PDGFRA (platelet derived growth factor receptor alpha) [NCBI Gene 5156] {aka CD140A, PDGFR-2, PDGFR2}, GJA1 (gap junction protein alpha 1) [NCBI Gene 2697] {aka AVSD3, CMDR, CX43, EKVP, EKVP3, GJAL}
- **Diseases:** Melanoma (MESH:D008545), scDDI (MESH:D012640), ARI (MESH:D000275)

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12908672/full.md

---
Source: https://tomesphere.com/paper/PMC12908672