# XhetRel: a pipeline for X heterozygosity and relatedness analysis of sequencing data

**Authors:** Barış Salman, Nerses Bebek, Sibel Uğur İşeri

PMC · DOI: 10.1093/bioadv/vbag002 · Bioinformatics Advances · 2026-01-22

## TL;DR

XhetRel is a tool for analyzing X chromosome heterozygosity and relatedness in sequencing data to detect sample sex errors and inconsistencies.

## Contribution

XhetRel introduces a user-friendly pipeline for Xhet and relatedness analysis, accessible via Nextflow or a browser-based notebook.

## Key findings

- XhetRel enables sex-based clustering and relatedness assessment from VCF files.
- Analysis revealed pseudogenes and gene clusters like SLC25A5 and GAGE contribute to misleading variant allele fractions.
- The tool supports users without bioinformatics infrastructure and integrates into modular workflows.

## Abstract

Verification of sample sex is an essential quality control step in next-generation sequencing studies, typically assessed from genomic data. Clustering individuals by X chromosome heterozygosity (Xhet) and incorporating relatedness estimates offers a practical first-pass screen for potential sex label errors, sample mix-ups, and pedigree inconsistencies. To better interpret Xhet based patterns, we further investigated the biological and technical origins using the 1000 Genomes Project dataset.

We developed XhetRel, a user-friendly workflow and notebook application that computes Xhet and performs relatedness estimation directly from VCF files. As a fully genotype-based approach, XhetRel enables both sex-based clustering and relatedness assessment as an initial quality control (QC) step in NGS. XhetRel serves groups without bioinformatics infrastructure, users requiring a browser-based QC tool, and workflow developers seeking a modular Nextflow component. Our investigation into the sources of Xhet variation highlighted important limitations in sequencing and variant-calling approaches. In particular, specific pseudogenes and gene clusters, such as SLC25A5 and the GAGE cluster, as recurrent contributors to misleading variant allele fractions.

The source code and data are available at Figshare (doi: 10.6084/m9.figshare.28280414). XhetRel can be executed locally via Nextflow or accessed directly through the online Collab notebook at https://colab.research.google.com/drive/1ep69JvXLwK5ndHUQ8qIGTWvauzsTW9fi.

## Linked entities

- **Genes:** SLC25A5 (solute carrier family 25 member 5) [NCBI Gene 292]

## Full-text entities

- **Genes:** SLC25A5 (solute carrier family 25 member 5) [NCBI Gene 292] {aka 2F1, AAC2, ANT2, T2, T3}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12883445/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12883445/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12883445/full.md

---
Source: https://tomesphere.com/paper/PMC12883445