# Benchmarking alignment strategies for Hi-C reads in metagenomic Hi-C data

**Authors:** Yuqiu Wang, Wenxuan Zuo, Jiawei Huang, Fengzhu Sun, Yuxuan Du

PMC · DOI: 10.1186/s13059-026-03970-x · 2026-01-30

## TL;DR

This paper compares different methods for aligning Hi-C reads in metagenomic data to determine which ones best support accurate analysis of microbial communities.

## Contribution

The study introduces a focused benchmark of Hi-C alignment strategies specifically for metagenomic data, highlighting performance trade-offs between accuracy and efficiency.

## Key findings

- BWA MEM -5SP outperformed other tools in inter-contig read pairs and binning quality across most environments.
- Chromap and Minimap2 showed the highest computational efficiency despite lower accuracy in some metrics.
- Alignment performance varied significantly across synthetic and real-world metagenomic datasets.

## Abstract

Metagenomics combined with High-throughput Chromosome Conformation Capture (Hi-C) provides a powerful approach to study microbial communities by linking genomic content with spatial interactions. Hi-C complements shotgun sequencing by revealing taxonomic composition, functional interactions, and genomic organization within a single sample. However, aligning Hi-C reads to metagenomic contigs is challenging due to variable insert sizes of Hi-C paired-end reads, multi-species complexity, and gaps in assemblies. Although several benchmark studies have evaluated general alignment tools and Hi-C data alignment, none have specifically focused on metagenomic Hi-C data.

We evaluated seven alignment strategies commonly used in Hi-C analyses: BWA MEM -5SP, BWA MEM default, BWA aln default, Bowtie2 default, Bowtie2 –very-sensitive-local, Minimap2 default, and Chromap Hi-C default. We benchmarked these tools on one synthetic dataset and seven real-world environments. Performance was assessed based on the number of inter-contig Hi-C read pairs and their impact on downstream tasks, such as binning quality.

We show that BWA MEM -5SP generally outperformed all other tools across most environments in terms of inter-contig read pairs and binning quality, followed by BWA MEM default. Chromap and Minimap2, while less effective in these metrics, demonstrated the highest computational efficiency.

The online version contains supplementary material available at 10.1186/s13059-026-03970-x.

## Full-text entities

- **Diseases:** MetaHi-C (OMIM:211750), TN (MESH:C562719)
- **Chemicals:** 5SP (-), Hi (MESH:D006639)
- **Species:** Homo sapiens (human, species) [taxon 9606], Bos taurus (bovine, species) [taxon 9913], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Sus scrofa (pig, species) [taxon 9823], Ovis aries (domestic sheep, species) [taxon 9940]
- **Cell lines:** C — Mus musculus (Mouse), Finite cell line (CVCL_S361)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12964890/full.md

---
Source: https://tomesphere.com/paper/PMC12964890