# Comparison of variant callers using 60 532 multi-ancestry whole genome sequences

**Authors:** Hufeng Zhou, Zilin Li, Derek Shyr, Xihao Li, Haoyu Yang, Rounak Dey, Yushi Tang, Robert Maier, Eric Boerwinkle, Steve Buyske, Mark Daly, Adam Felsenfeld, Richard A Gibbs, Namrata Gupta, Ira M Hall, Tara Matise, Ginger A Metcalf, Albert Smith, Catherine Reeves, Heidi J Sofia, Nathan O Stitziel, Michael C Zody, Benjamin Neale, Xihong Lin

PMC · DOI: 10.1093/bib/bbag130 · Briefings in Bioinformatics · 2026-03-27

## TL;DR

This study compares two popular tools for identifying genetic variants in large-scale whole genome sequencing data.

## Contribution

The study evaluates the consistency and reliability of GATK and VT variant callers on a large multi-ancestry dataset.

## Key findings

- Both GATK and VT pipelines produce highly consistent Single Nucleotide Variants (SNVs).
- The pipelines show greater discrepancies in calling insertions and deletions (INDELs).

## Abstract

Whole genome sequencing (WGS) studies play a pivotal role in studying the genetic underpinnings of human diseases and traits. High quality and reproducible variant calling is the cornerstone for the success of downstream analyses, including WGS association studies and polygenic risk prediction. This paper compares the data quality, performance, and concordance of two widely used WGS variant callers, the Genome Analysis Toolkit (GATK) and Variant Tool set that discovers short variants (VT), using 60 532 multi-ancestry whole genomes sequenced by the Centers for Common Disease Genomics (CCDGs) of the NHGRI Genome Sequencing Program. Our findings show that both QCed GATK and VT pipelines yield highly consistent and reliable called Single Nucleotide Variants (SNVs) in large-scale WGS studies, supporting their agreements in joint variants calling. However, the two pipelines exhibit greater discrepancies in calling insertions and deletions (INDELs).

## Full-text entities

- **Genes:** CDKN2B-AS1 (CDKN2B and CDKN2A antisense cis and trans regulatory RNA 1) [NCBI Gene 100048912] {aka 66CTG, ANRIL, CDKN2B-AS, CDKN2BAS, NCRNA00089, PCAT12}
- **Diseases:** CCDG (MESH:D002340), VT (MESH:D008881), EOCVD (MESH:D002318), XL (MESH:D000080345), coronary artery disease (MESH:D003324)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs4977574, a/G

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13023369/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13023369/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC13023369/full.md

---
Source: https://tomesphere.com/paper/PMC13023369