# Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

**Authors:** Can Luo, Yichen Henry Liu, Han Liu, Zhenmiao Zhang, Lu Zhang, Brock A. Peters, Xin Maizie Zhou

PMC · DOI: 10.21203/rs.3.rs-8408441/v1 · Research Square · 2026-01-12

## TL;DR

This paper explores how different linked-read sequencing methods affect the accuracy of detecting genetic variants, especially structural variants, in complex genomes.

## Contribution

The study introduces a novel linked-read sequencing simulator and evaluates extended single-end reads for improved variant detection.

## Key findings

- Extended single-end reads (SE1000_stLFR) outperform paired-end reads in structural variant detection.
- Hybrid libraries combining read types improve overall variant detection performance across genomic contexts.
- Shorter paired-end reads show higher precision for SNPs and INDELs in high-confidence regions.

## Abstract

Accurate detection of genetic variants, including single nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs), and structural variants (SVs), is critical for comprehensive genomic analysis. While traditional short-read sequencing performs well for SNP and INDEL detection, it struggles to resolve SVs, especially in complex genomic regions, due to inherent read length limitations. Linked-read sequencing technologies, such as single-tube Long Fragment Read sequencing (stLFR), overcome these challenges by employing molecular barcodes, providing crucial long-range information.

This study investigates traditional pair-end linked-reads and a conceptual extension of linked-read technology: barcoded single-end reads of 500 bp (SE500_stLFR) and 1000 bp (SE1000_stLFR), generated using the single-tube Long Fragment Read (stLFR) platform. Unlike conventional paired-end (PE100_stLFR) linked reads, these longer single-end reads could offer improved resolution for variant detection by leveraging extended read lengths per barcode. To explore the potential of stLFR reads, we developed stLFR-sim, a Python-based simulator that reproduces the stLFR linked-read sequencing workflow to enable realistic simulation and benchmarking of linked-read sequencing data. Using stLFR-sim, we simulated a diverse set of datasets for the HG002 sample using T2T-based realistic genome simulation. Variant detection performance was then systematically assessed across three stLFR configurations: standard PE100_stLFR, SE500_stLFR, and SE1000_stLFR.

Benchmarking against the Genome in a Bottle (GIAB) gold standard reveals distinct strengths of each configuration. Extended single-end reads (SE500_stLFR and SE1000_stLFR) significantly enhance SV detection, with SE1000_stLFR providing the best balance between precision and recall. In contrast, the shorter PE100_stLFR reads exhibit higher precision for SNP and INDEL calling, particularly within high-confidence regions, though with reduced performance in low-mappability contexts. To explore optimization strategies, we constructed hybrid libraries combining paired-end and single-end barcoded reads. These hybrid approaches integrate the complementary advantages of different read types, consistently outperforming single libraries across small variant types and genomic contexts.

Collectively, our findings offer a robust comparative framework for evaluating stLFR sequencing strategies, highlight the promise of barcoded single-end reads for improving SV detection, and provide practical guidance for tailoring sequencing designs to the complexities of the genome.

## Full-text entities

- **Mutations:** T2T

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12869651/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12869651/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12869651/full.md

---
Source: https://tomesphere.com/paper/PMC12869651