# Benchmarking of methods to analyse data derived from GBS-MeDIP

**Authors:** Violeta de Anca Prado, Fábio Pértille, Pedro Sá, Marta Gòdia, Joëlle Rüegg, Josep C. Jimenez-Chillaron, Carlos Guerrero-Bosagna

PMC · DOI: 10.1186/s12859-025-06330-x · 2026-01-19

## TL;DR

This paper benchmarks bioinformatics tools for analyzing GBS-MeDIP data, finding that FeatureCounts and Mann-Whitney provide the most accurate results.

## Contribution

The study identifies optimal tools for GBS-MeDIP data analysis, showing that standard RNA-seq pipelines are inadequate.

## Key findings

- FeatureCounts outperforms MEDIPS for count matrix generation in GBS-MeDIP data.
- Mann-Whitney test has the lowest false positive rate and highest true positive rate for differential methylation analysis.
- Standard RNA-seq or MeDIP-seq pipelines introduce statistical artifacts in GBS-MeDIP data analysis.

## Abstract

Genotyping-by-Sequencing with Methylated DNA Immunoprecipitation (GBS-MeDIP) is an emerging method for cost-effective DNA methylation analysis. However, due to its unique sequencing output, conventional bioinformatics pipelines used for RNA-seq and MeDIP-seq are not fully adequate for analyzing GBS-MeDIP data. Selecting the appropriate statistical methods for differential methylation analysis remains a challenge, as existing approaches may introduce bias or false positives.

We benchmarked multiple statistical methods for analyzing GBS-MeDIP data using previously generated datasets from chickens, dogs, and pigs. FeatureCounts was identified as the most reliable tool for count matrix generation, outperforming MEDIPS, which introduced biases in count estimation. For differential methylation analysis, we evaluated EdgeR, limma, DESeq2, and the Mann-Whitney test. Our results demonstrated that Mann-Whitney provided the lowest false positive rate and highest true positive rate, outperforming both EdgeR, DESeq2, and limma. EdgeR’s quasi-likelihood method exhibited a high false positive rate, making it unsuitable for GBS-MeDIP analysis.

Our findings highlight that GBS-MeDIP data should not be analyzed using standard RNA-seq or MeDIP-seq pipelines, as these approaches lead to statistical artifacts. Instead, we recommend featureCounts for count matrix creation and Mann-Whitney for differential methylation analysis, ensuring accurate detection of differentially methylated windows. This study provides a bioinformatics framework for analyzing GBS-MeDIP data, minimizing biases and improving reliability in epigenomic research.

The online version contains supplementary material available at 10.1186/s12859-025-06330-x.

## Full-text entities

- **Genes:** TPR (translocated promoter region, nuclear basket protein) [NCBI Gene 100520507]
- **Diseases:** GLM (MESH:D004195), GBS (MESH:D010855), RRBS (MESH:D001523)
- **Chemicals:** Bisulfite (MESH:C042345), 5-Methyl cytosine (MESH:D044503), 5'-Cytosine-phosphate-Guanine-3' (-), cytosine (MESH:D003596), thymines (MESH:D013941), uracil (MESH:D014498)
- **Species:** Gallus (genus) [taxon 9030], Sus scrofa (pig, species) [taxon 9823], Canis lupus familiaris (dog, subspecies) [taxon 9615], Gallus gallus (bantam, species) [taxon 9031]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12829230/full.md

---
Source: https://tomesphere.com/paper/PMC12829230