# Utilizing Deep Neural Networks to Fill Gaps in Small Genomes

**Authors:** Yu Chen, Gang Wang, Tianjiao Zhang

PMC · DOI: 10.3390/ijms25158502 · International Journal of Molecular Sciences · 2024-08-04

## TL;DR

This paper introduces DLGapCloser, a deep learning method that improves gap filling in small genomes using a novel algorithm and evaluation standards.

## Contribution

DLGapCloser introduces a deep learning model and Wave-Beam Search algorithm to enhance gap filling in small genomes.

## Key findings

- Wave-Beam Search improved gap-filling performance by up to 42.85% compared to traditional tools.
- DLGapCloser increased filled gaps by 8.05% to 15.3% in tested genomes.
- A new evaluation method was developed and validated on four species.

## Abstract

With the widespread adoption of next-generation sequencing technologies, the speed and convenience of genome sequencing have significantly improved, and many biological genomes have been sequenced. However, during the assembly of small genomes, we still face a series of challenges, including repetitive fragments, inverted repeats, low sequencing coverage, and the limitations of sequencing technologies. These challenges lead to unknown gaps in small genomes, hindering complete genome assembly. Although there are many existing assembly software options, they do not fully utilize the potential of artificial intelligence technologies, resulting in limited improvement in gap filling. Here, we propose a novel method, DLGapCloser, based on deep learning, aimed at assisting traditional tools in further filling gaps in small genomes. Firstly, we created four datasets based on the original genomes of Saccharomyces cerevisiae, Schizosaccharomyces pombe, Neurospora crassa, and Micromonas pusilla. To further extract effective information from the gene sequences, we also added homologous genomes to enrich the datasets. Secondly, we proposed the DGCNet model, which effectively extracts features and learns context from sequences flanking gaps. Addressing issues with early pruning and high memory usage in the Beam Search algorithm, we developed a new prediction algorithm, Wave-Beam Search. This algorithm alternates between expansion and contraction phases, enhancing efficiency and accuracy. Experimental results showed that the Wave-Beam Search algorithm improved the gap-filling performance of assembly tools by 7.35%, 28.57%, 42.85%, and 8.33% on the original results. Finally, we established new gap-filling standards and created and implemented a novel evaluation method. Validation on the genomes of Saccharomyces cerevisiae, Schizosaccharomyces pombe, Neurospora crassa, and Micromonas pusilla showed that DLGapCloser increased the number of filled gaps by 8.05%, 15.3%, 1.4%, and 7% compared to traditional assembly tools.

## Linked entities

- **Species:** Saccharomyces cerevisiae (taxon 4932), Schizosaccharomyces pombe (taxon 4896), Neurospora crassa (taxon 5141), Micromonas pusilla (taxon 38833)

## Full-text entities

- **Chemicals:** DLGapCloser (-)
- **Species:** Schizosaccharomyces pombe (fission yeast, species) [taxon 4896], Neurospora crassa (species) [taxon 5141], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Micromonas pusilla (species) [taxon 38833]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11313336/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11313336/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC11313336/full.md

---
Source: https://tomesphere.com/paper/PMC11313336