# NGSTroubleFinder: a tool for detection and quantification of contamination and kinship across human NGS data

**Authors:** Samuel Valentini, Tecla Venturelli, Xavier Gallego, Lynn Durham, Laura Perez-Cano, Emre Guney

PMC · DOI: 10.1093/nargab/lqag006 · NAR Genomics and Bioinformatics · 2026-01-27

## TL;DR

NGSTroubleFinder is a new tool that detects contamination and sample issues in human sequencing data to ensure data quality and integrity.

## Contribution

The novel contribution is a tool that detects contamination, sample swaps, and sex mismatches directly from BAM/CRAM files without variant calling.

## Key findings

- NGSTroubleFinder identifies cross-sample contamination and sample swaps in NGS data.
- It provides integrated quality control for genetic and transcriptomic sex prediction and kinship analysis.
- The tool generates detailed reports with plots and metrics in both text and HTML formats.

## Abstract

Quality control constitutes a critical component of any next-generation sequencing (NGS) pipeline; however, most existing pipelines emphasize technical quality assessment (e.g. read quality, alignment metrics, duplication rates) while overlooking other equally important dimensions, such as sample identity verification, contamination detection, kinship analysis, and metadata concordance. Detecting issues like cross-sample contamination and sample swaps is essential to control data integrity. Here, we present NGSTroubleFinder, a novel tool to detect cross-sample contamination in human whole-genome and whole-transcriptome sequencing data, sample swaps, and mismatches between the reported and the inferred genetic and transcriptomic sexes. It can be run directly on BAM/CRAM files without requiring additional variant-calling steps and offers an integrated pipeline for ensuring quality control on NGS data, generated particularly within the context of clinical studies or research projects involving family members. It produces a detailed report that combines the results of its multiple analyses, including kinship, sex prediction, and contamination metrics. The tool reports extensive information on the samples, both in textual and HTML formats, including key plots for easy interpretation of the results. NGSTroubleFinder is written in Python and incorporates a custom-built parallelized pileup engine written in C, and it can be easily installed with pip. The tool source code and the models are freely available on GitHub (https://github.com/STALICLA-RnD/NGSTroubleFinder), and a containerized version is available on Docker Hub (https://hub.docker.com/r/staliclarnd/ngstroublefinder).

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12838523/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12838523/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12838523/full.md

---
Source: https://tomesphere.com/paper/PMC12838523