# Studying relative RNA localization from nucleus to the cytosol

**Authors:** Vasilis F Ntasis, Roderic Guigó

PMC · DOI: 10.1093/nargab/lqaf032 · 2025-06-20

## TL;DR

This paper introduces a method to accurately estimate RNA localization between the nucleus and cytosol using RNA sequencing data from multiple cellular compartments.

## Contribution

A novel method for estimating RNA localization by combining nuclear, cytosolic, and whole-cell RNAseq data.

## Key findings

- The method accurately estimates the fraction of total RNA in the cytosol and nucleus.
- It was validated on simulated and real single-cell RNAseq data.
- The approach was applied to bulk RNAseq data from the ENCODE project to study RNA localization.

## Abstract

The precise coordination of important biological processes, such as differentiation and development, relies heavily on the regulation of gene expression. In eukaryotic cells, understanding the distribution of RNA transcripts between the nucleus and cytosol is essential for gaining valuable insights into the process of protein production. The most efficient way to estimate the levels of RNA species genome-wide is through RNA sequencing (RNAseq). While RNAseq can be performed separately in the nucleus and in the cytosol, comparing transcript levels between compartments is challenging since measurements are relative to the unknown total RNA volume. Here, we show theoretically that if, in addition to nuclear and cytosolic RNAseq, whole-cell RNAseq is also performed, then accurate estimations of the localization of transcripts can be obtained. Based on this, we designed a method that estimates, first the fraction of the total RNA volume in the cytosol (nucleus), and then, this fraction for every transcript. We evaluate our methodology on simulated data and nuclear and cytosolic single-cell data available. Finally, we use our method to investigate the subcellular localization of transcripts using bulk RNAseq data from the ENCODE project.

## Full-text entities

- **Genes:** GAPDH (glyceraldehyde-3-phosphate dehydrogenase) [NCBI Gene 2597] {aka G3PD, GAPD, HEL-S-162eP}, NEAT1 (nuclear paraspeckle assembly transcript 1) [NCBI Gene 283131] {aka LINC00084, NCRNA00084, TP53LC15, TncRNA, VINC}
- **Diseases:** neuromuscular disorders (MESH:D009468), LI (MESH:C566784), ENCODE (MESH:C565217)
- **Chemicals:** PolyA (MESH:D011061), GC (MESH:C057580)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** NHEK — Homo sapiens (Human), Finite cell line (CVCL_9Q50), GM12878 — Homo sapiens (Human), Transformed cell line (CVCL_7526), IMR-90 — Homo sapiens (Human), Finite cell line (CVCL_0347), MCF-7 — Homo sapiens (Human), Invasive breast carcinoma of no special type, Cancer cell line (CVCL_0031), K562 — Homo sapiens (Human), Blast phase chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_0004), SK-N-SH — Homo sapiens (Human), Neuroblastoma, Cancer cell line (CVCL_0531), HeLa-S3 — Homo sapiens (Human), Human papillomavirus-related endocervical adenocarcinoma, Cancer cell line (CVCL_0058), HUVEC — Homo sapiens (Human), Finite cell line (CVCL_2959), A549 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_0023), H1 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_HA53), HepG2 — Homo sapiens (Human), Hepatoblastoma, Cancer cell line (CVCL_0027)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12204760/full.md

---
Source: https://tomesphere.com/paper/PMC12204760