# Deconvolving cell-type-specific gene expression profiles from bulk RNA-seq samples

**Authors:** Sichen Zhu, Zhengqi Wang, Kevin D. Bunting, Peng Qiu

PMC · DOI: 10.1371/journal.pcbi.1014101 · PLOS Computational Biology · 2026-03-26

## TL;DR

A new deep learning algorithm called BLUE can extract detailed cell-type-specific gene expression from bulk RNA-seq data, enabling better cancer patient subtyping and biomarker discovery.

## Contribution

BLUE, a U-Net-based algorithm, outperforms existing methods in deconvolving bulk RNA-seq into cell-type-specific gene expression profiles.

## Key findings

- BLUE accurately predicts cell-type proportions and gene expression from bulk RNA-seq data.
- The algorithm enables integrative cancer patient subtyping and identification of cell-type-specific prognostic gene signatures.

## Abstract

Bulk RNA sequencing (bulk RNA-seq) and single-cell RNA sequencing (scRNA-seq) are two important high-throughput sequencing platforms that have wide applications in biomedical research. Bulk RNA-seq reflects the average gene expression of all cells in the sample at a low experimental cost, whereas scRNA-seq enables transcriptomics profiling at a single-cell level, although with higher experimental costs. To integrate the strengths of both sequencing approaches and capitalize on the wealth of existing bulk RNA-seq datasets, we developed a U-Net-based deep learning algorithm, BLUE, to deconvolve bulk RNA-seq samples into cell-type proportions and cell-type-specific gene expression profiles. Built upon a U-Net backbone, BLUE leverages its powerful feature extraction and representation learning capabilities to achieve accurate predictions for cell-type-specific gene expression profiles, which significantly outperform existing deconvolution algorithms. Given the accurate prediction from BLUE, we developed an integrative framework for subtyping cancer patients and identifying cell-type-specific gene signatures that can function as prognostic biomarkers for cancer.

Understanding the behavior of different cell types in the human body is crucial for advancing medical research, especially in areas such as cancer diagnosis and treatment. Scientists use powerful sequencing technologies to study gene expression activity, either by examining all the cells in a sample together (which is affordable but less detailed) or by analyzing individual cells (which offers more detailed information but is significantly more expensive). In our work, we aim to combine the advantages of both worlds.

We developed a novel computational algorithm, BLUE, to “unmix” cell-type-level information from cheaper, bulk samples and reveal what is happening inside each cell type. We adapted U-Net to uncover cell-specific gene activity with high accuracy.

Using BLUE, we can now make better use of the large amounts of existing bulk data to study diseases like cancer. For example, we can identify different groups of patients based on their cell activity and discover specific gene patterns that may help predict how a person’s cancer will progress. This can facilitate better understanding of cancer biology and ultimately support more personalized and effective treatment strategies.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, PTEN (phosphatase and tensin homolog) [NCBI Gene 5728] {aka 10q23del, BZS, CWS1, DEC, GLM2, MHAM}, FLT3 (fms related receptor tyrosine kinase 3) [NCBI Gene 2322] {aka CD135, FLK-2, FLK2, STK1}, HMBS (hydroxymethylbilane synthase) [NCBI Gene 3145] {aka ENCEP, LENCEP, PBG-D, PBGD, PORC, UPS}
- **Diseases:** T2D3 (MESH:C566342), AML (MESH:D015470), BLUE (OMIM:190900), T2D (MESH:D003924), T2D4 (MESH:C564299), TARGET (MESH:D014947), inflammatory (MESH:D007249), T2D1 (MESH:C563359), Cancer (MESH:D009369), GMP (MESH:D055501), melanoma (MESH:D008545)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13038110/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13038110/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC13038110/full.md

---
Source: https://tomesphere.com/paper/PMC13038110