# Improving spliced alignment by modeling splice sites with deep learning

**Authors:** Siying Yang, Neng Huang, Heng Li

PMC · DOI: 10.1186/s13015-025-00293-7 · Algorithms for Molecular Biology : AMB · 2026-01-02

## TL;DR

This paper introduces a deep learning model to improve the accuracy of spliced alignment by better modeling splice sites in genomes.

## Contribution

A novel deep learning model, minisplice, is introduced to capture conserved splice signals and improve alignment accuracy.

## Key findings

- The model captures conserved splice signals across vertebrates and insects.
- It reveals GC-rich introns specific to mammals and birds.
- Modifications to aligners improved junction accuracy for long RNA-seq reads and distant homology proteins.

## Abstract

Spliced alignment refers to the alignment of messenger RNA (mRNA) or protein sequences to eukaryotic genomes. It plays a critical role in gene annotation and the study of gene functions. Accurate spliced alignment demands sophisticated modeling of splice sites, but current aligners use simple models, which may affect their accuracy given dissimilar sequences.

We implemented minisplice to learn splice signals with a one-dimensional convolutional neural network (1D-CNN) and trained a model with 7026 parameters for vertebrate and insect genomes. It captures conserved splice signals across phyla and reveals GC-rich introns specific to mammals and birds. We used this model to estimate the empirical splicing probability for every GT and AG in genomes, and modified minimap2 and miniprot to leverage pre-computed splicing probability during alignment. Evaluation on human long-read RNA-seq data and cross-species protein datasets showed our method greatly improves the junction accuracy especially for noisy long RNA-seq reads and proteins of distant homology.

https://github.com/lh3/minisplice

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12766944/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12766944/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12766944/full.md

---
Source: https://tomesphere.com/paper/PMC12766944