# Joint processing of long- and short-read sequencing data with deep learning improves variant calling

**Authors:** Gennaro Gambardella

PMC · DOI: 10.1016/j.crmeth.2025.101107 · 2025-07-15

## TL;DR

Combining short- and long-read sequencing data with deep learning improves accuracy and cost-efficiency in detecting genetic variants.

## Contribution

A hybrid DeepVariant model that jointly processes Illumina and Nanopore data improves germline variant detection accuracy.

## Key findings

- Shallow hybrid sequencing matches or surpasses single-technology methods in variant detection accuracy.
- Hybrid sequencing enables detection of large structural variations while reducing sequencing costs.
- Joint modeling of hybrid inputs improves DeepVariant's performance compared to single-technology approaches.

## Abstract

Despite the complementary strengths of short- and long-read sequencing approaches, variant-calling methods still rely on a single data type. In this study, we collected and harmonized Nanopore datasets of the seven healthy individuals in the GIAB project across three independent consortia. By leveraging these harmonized Nanopore data, we explore the benefits of using a hybrid DeepVariant model to jointly process Illumina and Nanopore data for germline variant detection. We show that a shallow hybrid long-short sequencing approach can match or surpass the germline variant detection accuracy of state-of-the-art single-technology methods, potentially reducing overall sequencing costs and enabling the detection of large germline structural variations. These findings hold great promise for molecular diagnostics in clinical settings, particularly for rare genetic disease screenings.

•Hybrid short- and long-read sequencing improves variant detection•Joint modeling of hybrid inputs improves DeepVariant’s performance•Shallow hybrid sequencing yields competitive performance to deep single-tech sequencing•Shallow hybrid sequencing may lower costs in large-scale clinical screening

Hybrid short- and long-read sequencing improves variant detection

Joint modeling of hybrid inputs improves DeepVariant’s performance

Shallow hybrid sequencing yields competitive performance to deep single-tech sequencing

Shallow hybrid sequencing may lower costs in large-scale clinical screening

Short-read and long-read sequencing technologies offer distinct yet complementary strengths. Short reads excel at detecting small variants but struggle in complex or repetitive genome regions. In contrast, long reads provide better coverage of these regions and are well suited for identifying large structural variants, although their higher error rates can compromise the detection of small variants. Despite these complementary strengths, current variant-calling methods still rely on a single sequencing data type. As a result, there is an urgent need for hybrid approaches that can integrate both sequencing methods, leveraging their respective advantages to correct biases and improve overall detection accuracy.

Gambardella shows that shallow short- and long-read sequencing combined with a retrained DeepVariant model improves small variant detection. This hybrid strategy offers a cost-effective alternative to deep single-tech sequencing and enables unified variant calling instead of post hoc merging across platforms.

## Full-text entities

- **Diseases:** genetic disease (MESH:D030342)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12296420/full.md

---
Source: https://tomesphere.com/paper/PMC12296420