# Effects of subsampling on characteristics of RNA-seq data from triple-negative breast cancer patients

**Authors:** Alexey Stupnikov, Galina V Glazko, Frank Emmert-Streib

PMC · DOI: 10.1186/s40880-015-0040-8 · Chinese Journal of Cancer · 2015-08-08

## TL;DR

This study examines how reducing RNA-seq data affects analysis outcomes in triple-negative breast cancer, highlighting the need for proper sequencing depth and analysis methods.

## Contribution

The study introduces insights into optimal sequencing depth and analysis strategies for RNA-seq data in clinical applications.

## Key findings

- Subsampling RNA-seq data produces realistic simulations but not exact count matrix scaling.
- Sequencing depth must exceed 32 million reads on average and 46 million reads per sample for reliable results.
- Higher gene expression distribution moments are more sensitive for signal detection than mean values.

## Abstract

Data from RNA-seq experiments provide a wealth of information about the transcriptome of an organism. However, the analysis of such data is very demanding. In this study, we aimed to establish robust analysis procedures that can be used in clinical practice.

We studied RNA-seq data from triple-negative breast cancer patients. Specifically, we investigated the subsampling of RNA-seq data.

The main results of our investigations are as follows: (1) the subsampling of RNA-seq data gave biologically realistic simulations of sequencing experiments with smaller sequencing depth but not direct scaling of count matrices; (2) the saturation of results required an average sequencing depth larger than 32 million reads and an individual sequencing depth larger than 46 million reads; and (3) for an abrogated feature selection, higher moments of the distribution of all expressed genes had a higher sensitivity for signal detection than the corresponding mean values.

Our results reveal important characteristics of RNA-seq data that must be understood before one can apply such an approach to translational medicine.

## Linked entities

- **Diseases:** triple-negative breast cancer (MONDO:0005494)

## Full-text entities

- **Diseases:** breast cancer (MESH:D001943), cancer (MESH:D009369), TNBC (MESH:D064726)
- **Chemicals:** SRR1313133 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC4593382/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC4593382/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC4593382/full.md

---
Source: https://tomesphere.com/paper/PMC4593382