PG-SCUnK: measuring pangenome graph representativeness using single-copy and universal K-mers
Tristan Cumer, Sotiria Milia, Alexander S. Leonard, Hubert Pausch

TL;DR
PG-SCUnK is a new tool that evaluates how well pangenome graphs represent genetic diversity by analyzing k-mers from source assemblies.
Contribution
The novel method quantifies the representativeness of pangenome graphs using single-copy and universal k-mers.
Findings
PG-SCUnK identifies fractions of unique, duplicated, and split k-mers in pangenome graphs.
The method correlates k-mer representation with short read mapping rates to the graph.
Insights from PG-SCUnK help optimize pangenome graph construction parameters.
Abstract
Pangenome graphs integrate multiple assemblies to represent non-redundant genetic diversity. However, current evaluations of pangenome graphs rely primarily on technical parameters (e.g., total length, number of nodes/edges, growth curves), which fail to assess how effectively the graph represents homologous stretches across the integrated assemblies and how well short reads align against pangenome graph references. We introduce a novel method to quantitatively assess how well a pangenome graph represents its integrated assemblies. Our method quantifies how many single-copy and universal k-mers from the source assemblies are uniquely and completely represented within the graph nodes. Implemented in the open-source tool PG-SCUnK, this approach identifies the fractions of unique, duplicated, and split k-mers, which correlate with short read mapping rates to the pangenome graph. Insights…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Bioinformatics and Genomic Networks · Genomics and Phylogenetic Studies
