# Prior-guided factorization for reliable imputation of scRNA-seq data

**Authors:** You Wu, Li Xu, Ye Win Aung, Alex Michel Daoud

PMC · DOI: 10.1371/journal.pcbi.1014051 · 2026-03-20

## TL;DR

This paper introduces scZN, a new method for improving the accuracy of single-cell RNA sequencing data by better distinguishing true gene silencing from technical noise.

## Contribution

The novel contribution is scZN, a framework using a two-state transcription model and nonnegative factorization to impute scRNA-seq data with biological interpretability.

## Key findings

- scZN outperforms existing methods in suppressing spurious gene activation and capturing true gene expression patterns.
- It improves trajectory inference in complex datasets like embryonic stem cells and mouse dentate gyrus data.
- scZN effectively recovers neuroinflammation pathways in Alzheimer’s disease data.

## Abstract

Single-cell RNA sequencing (scRNA-seq) provides an important means to reveal the heterogeneity and dynamic processes of tissues, organisms, and complex diseases, but technical capture loss (dropout) often obscures true biological expression, and existing imputation methods have difficulty distinguishing biological zeros (silent expression) from technical noise. To address this, we propose the imputation framework scZN. scZN assumes that the observed scRNA-seq data arise from a combination of RNA’s two-state transcription process and dropout, and formulates imputation as nonnegative factorization: decomposing the raw count matrix into two interpretable nonnegative factors, performing learning and optimization under constraints from prior knowledge and multiple regularizations, thereby reconstructing the cellular expression landscape. Experiments show that scZN can capture the true distributional characteristics at both the gene and cell levels and significantly suppress spurious activation of genes that should not be expressed. Across multiple real datasets, it outperforms dozens of state-of-the-art methods. Especially in complex experimental design scenarios, scZN markedly improves trajectory inference for embryonic stem cells and mouse dentate gyrus data. In Alzheimer’s disease data, scZN can also effectively recover pathways related to neuroinflammation, improving downstream scRNA-seq analysis. Overall, scZN provides a unified framework for missing-value imputation and expression reconstruction that combines accuracy and interpretability.

We aim to better understand true gene expression states in single-cell RNA sequencing data, where technical limitations introduce many zero values arising from both genuine gene silencing and missing signals due to insufficient capture efficiency. Distinguishing these two types of zeros is essential for revealing cellular heterogeneity but remains challenging for existing methods. Here, we present scZN, a single-cell data imputation framework based on a stochastic two-state model of RNA transcription that explicitly accounts for dropout. By formulating imputation as a biologically constrained nonnegative factorization problem, scZN recovers gene expression while maintaining interpretability. Across multiple real datasets, scZN more effectively suppresses spurious gene activation and improves downstream analyses such as developmental trajectory inference and pathway analysis, particularly in complex experimental designs and disease-related data, providing a unified and biologically meaningful solution for handling missing values in single-cell RNA sequencing.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)
- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Diseases:** neuroinflammation (MESH:D000090862), Alzheimer's disease (MESH:D000544)
- **Chemicals:** scZN (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13004523/full.md

---
Source: https://tomesphere.com/paper/PMC13004523