# PHScaffolding: a hypergraph clustering and dual-weight integration strategy for scaffolding with Pore-C reads

**Authors:** Quan Su, Junwei Luo, Fei Guo

PMC · DOI: 10.1093/bib/bbag003 · 2026-01-22

## TL;DR

PHScaffolding is a new method for genome scaffolding that uses Pore-C data to improve accuracy and continuity compared to traditional Hi-C-based methods.

## Contribution

Introduces a hypergraph clustering and dual-weight integration strategy specifically for scaffolding with Pore-C reads.

## Key findings

- PHScaffolding outperforms traditional Hi-C-based methods in terms of NA50 and NGA50 metrics.
- The method shows robust performance across human and plant genome datasets.
- It achieves lower misassembly rates compared to existing scaffolding approaches.

## Abstract

Genome assembly aims to construct chromosome-level genome sequences, with scaffolding serving as a critical step, the accuracy of which highly depends on the quality of the input data. Although both Hi-C and Pore-C technologies are used to study genomic 3D structures, Pore-C demonstrates irreplaceable advantages in high-precision assembly due to its ability to capture long-range information and provide multi-fragment interaction information. However, most current scaffolding methods primarily rely on Hi-C data, which is limited by the inherent constraints of the technology, resulting in deficiencies in assembly continuity and accuracy. We propose a scaffolding method based on Pore-C data, named PHScaffolding. This method constructs a hypergraph by leveraging alignment information from Pore-C reads to capture multi-way interactions among contigs. A dedicated weighting scheme for hyperedges is also introduced. Subsequently, PHScaffolding applies the Louvain algorithm to cluster the hypergraph, aiming to group contigs originating from the same chromosome. Finally, for contigs within each cluster, the method employs a novel strategy to orient and order them based on Pore-C read alignments, thereby generating chromosome-level scaffolds. Evaluations on HG002, GM12878, and Arabidopsis thaliana contig datasets demonstrate that PHScaffolding achieves strong performance and robustness in terms of NA50, NGA50, and misassembly rates. Comparative experiments show that it outperforms traditional Hi-C-based scaffolding methods. The source code of PHScaffolding is available at https://github.com/Suquana/PHScaffolding.

## Linked entities

- **Species:** Arabidopsis thaliana (taxon 3702)

## Full-text entities

- **Chemicals:** Pore (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12825295/full.md

---
Source: https://tomesphere.com/paper/PMC12825295