Improving spliced alignment by modeling splice sites with deep learning

Siying Yang; Neng Huang; Heng Li

arXiv:2506.12986·q-bio.GN·September 23, 2025·Algorithms Mol. Biol.

Improving spliced alignment by modeling splice sites with deep learning

Siying Yang, Neng Huang, Heng Li

PDF

1 Repo

TL;DR

This paper introduces minisplice, a deep learning-based model that improves spliced alignment accuracy by better modeling splice sites, especially in noisy long-read RNA-seq data and distant homology protein datasets.

Contribution

The authors developed a CNN-based model to learn conserved splice signals and integrated it into existing aligners to enhance spliced alignment accuracy.

Findings

01

Significant improvement in junction accuracy for noisy long RNA-seq reads.

02

Effective modeling of conserved splice signals across diverse species.

03

Revealed species-specific GC-rich introns.

Abstract

Motivation: Spliced alignment refers to the alignment of messenger RNA (mRNA) or protein sequences to eukaryotic genomes. It plays a critical role in gene annotation and the study of gene functions. Accurate spliced alignment demands sophisticated modeling of splice sites, but current aligners use simple models, which may affect their accuracy given dissimilar sequences. Results: We implemented minisplice to learn splice signals with a one-dimensional convolutional neural network (1D-CNN) and trained a model with 7,026 parameters for vertebrate and insect genomes. It captures conserved splice signals across phyla and reveals GC-rich introns specific to mammals and birds. We used this model to estimate the empirical splicing probability for every GT and AG in genomes, and modified minimap2 and miniprot to leverage pre-computed splicing probability during alignment. Evaluation on human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lh3/minisplice
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.