# PG-SCUnK: measuring pangenome graph representativeness using single-copy and universal K-mers

**Authors:** Tristan Cumer, Sotiria Milia, Alexander S. Leonard, Hubert Pausch

PMC · DOI: 10.1186/s12859-025-06355-2 · 2025-12-30

## TL;DR

PG-SCUnK is a new tool that evaluates how well pangenome graphs represent genetic diversity by analyzing k-mers from source assemblies.

## Contribution

The novel method quantifies the representativeness of pangenome graphs using single-copy and universal k-mers.

## Key findings

- PG-SCUnK identifies fractions of unique, duplicated, and split k-mers in pangenome graphs.
- The method correlates k-mer representation with short read mapping rates to the graph.
- Insights from PG-SCUnK help optimize pangenome graph construction parameters.

## Abstract

Pangenome graphs integrate multiple assemblies to represent non-redundant genetic diversity. However, current evaluations of pangenome graphs rely primarily on technical parameters (e.g., total length, number of nodes/edges, growth curves), which fail to assess how effectively the graph represents homologous stretches across the integrated assemblies and how well short reads align against pangenome graph references.

We introduce a novel method to quantitatively assess how well a pangenome graph represents its integrated assemblies. Our method quantifies how many single-copy and universal k-mers from the source assemblies are uniquely and completely represented within the graph nodes. Implemented in the open-source tool PG-SCUnK, this approach identifies the fractions of unique, duplicated, and split k-mers, which correlate with short read mapping rates to the pangenome graph.

Insights provided by PG-SCUnK facilitate the selection of appropriate parameters to build optimal reference pangenome graphs.

The online version contains supplementary material available at 10.1186/s12859-025-06355-2.

## Full-text entities

- **Chemicals:** PG (-)
- **Species:** Bos taurus (bovine, species) [taxon 9913], Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12859900/full.md

---
Source: https://tomesphere.com/paper/PMC12859900