# DL-GapFilling: a novel deep learning framework for improved plant genome gap filling

**Authors:** Yu Chen, Zihao Wang, Gang Wang, Guohua Wang

PMC · DOI: 10.1093/bib/bbag007 · Briefings in Bioinformatics · 2026-01-26

## TL;DR

DL-GapFilling is a deep learning framework that improves plant and algal genome assembly by efficiently filling gaps in genomic sequences.

## Contribution

DL-GapFilling introduces a novel deep learning model and algorithm for more accurate and efficient genome gap filling.

## Key findings

- DL-GapFilling outperforms traditional tools by filling 15.6% to 23.5% more gaps across multiple plant and algal datasets.
- The framework improves both efficiency and accuracy compared to existing deep learning-based methods.
- A PredictionFilter mechanism enhances assembly quality by retaining high-confidence predictions.

## Abstract

Genome assembly has been a cornerstone of bioinformatics for decades, with faster and more accurate assembly of unknown genomes remaining a critical challenge. However, genome diversity, structural variations, insufficient sequencing depth, and limitations of current algorithms often lead to numerous gaps during assembly, hindering the construction of high-quality reference genomes. While various assembly methods and software tools have been developed, most exhibit low efficiency in gap filling and fail to account for the intrinsic structural properties of genomic sequences. Here, we present DL-GapFilling, a deep learning-based framework for genome assembly and gap filling. DL-GapFilling leverages a novel Deep Filling Neural Network model to efficiently extract and contextualize flanking sequence information, and incorporates the BeamStar contraction-expand algorithm, which integrates a redefined cost function, an enhanced search strategy, and genomic structural priors to improve both generalization and efficiency in gap filling. In addition, a PredictionFilter mechanism is introduced to selectively retain high-confidence predictions, mitigating the impact of poorly predicted sequences on assembly quality. Experimental results demonstrate that DL-GapFilling significantly improves gap-filling performance across multiple plant or algal genome datasets, achieving increases of 15.6%, 6.1%, 16.7%, 5.5%, and 23.5% in the number of gaps filled compared to traditional tools, and outperforming existing DL-based methods in both efficiency and accuracy. These findings underscore the potential of DL-GapFilling as a powerful tool for advancing genome assembly research.

## Full-text entities

- **Diseases:** LSTM (MESH:D000088562)
- **Chemicals:** DLGapCloser (-)
- **Species:** Thalictrum thalictroides (rue-anemone, species) [taxon 46969], Utricularia gibba (humped bladderwort, species) [taxon 13748], Eutrema salsugineum (saltwater cress, species) [taxon 72664], Micromonas pusilla (species) [taxon 38833], Oryza longistaminata (longstamen rice, species) [taxon 4528]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12834303/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12834303/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC12834303/full.md

---
Source: https://tomesphere.com/paper/PMC12834303