# Systematic evaluation of de novo mutation calling tools using whole genome sequencing data

**Authors:** Anushi Shah, Steven Monger, Michael Troup, Eddie K K Ip, Eleni Giannoulatou

PMC · DOI: 10.1093/bib/bbaf543 · Briefings in Bioinformatics · 2025-11-06

## TL;DR

This study compares five tools for identifying new genetic mutations in offspring using real and simulated whole genome data.

## Contribution

The first systematic evaluation of de novo mutation calling tools using both real and simulated whole genome sequencing data.

## Key findings

- Only 8.4% of de novo mutations were detected by all tools in real data, with 83.8% found by only one tool.
- DeNovoGear had the highest accuracy on real data, while DeNovoCNN performed best on simulated data.
- Low concordance rates (3.9%) were observed in simulated datasets with 100 known de novo mutations.

## Abstract

De novo mutations (DNMs) are genetic alterations that occur for the first time in an offspring. DNMs have been found to be a significant cause of severe developmental disorders. With the widespread use of next-generation sequencing (NGS) technologies, accurate detection of DNMs is crucial. Several bioinformatics tools have been developed to call DNMs from NGS data, but no study to date has systematically compared these tools. We used both real whole genome sequencing (WGS) data from a trio from the 1000 Genomes Project (1000G) and an in-house simulated trio dataset to evaluate five DNM calling tools: DeNovoGear, TrioDeNovo, PhaseByTransmission, VarScan 2, and DeNovoCNN. For DNMs called in the real dataset, we observed 8.4% concordance of variants between all tools, while 83.8% of DNMs variants were identified by only one caller. For simulated trio WGS dataset spiked with 100 DNMs, the concordance rate was also low at 3.9%. DeNovoGear achieved the highest F1 score on the real 1000G dataset, while DeNovoCNN had the highest F1 score on the simulated data. Our study provides valuable recommendations for the selection and application of DNM callers on WGS trio data.

## Full-text entities

- **Diseases:** developmental disorders (MESH:D002658)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12596277/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12596277/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12596277/full.md

---
Source: https://tomesphere.com/paper/PMC12596277