TL;DR
This paper introduces minisplice, a deep learning-based model that improves spliced alignment accuracy by better modeling splice sites, especially in noisy long-read RNA-seq data and distant homology protein datasets.
Contribution
The authors developed a CNN-based model to learn conserved splice signals and integrated it into existing aligners to enhance spliced alignment accuracy.
Findings
Significant improvement in junction accuracy for noisy long RNA-seq reads.
Effective modeling of conserved splice signals across diverse species.
Revealed species-specific GC-rich introns.
Abstract
Motivation: Spliced alignment refers to the alignment of messenger RNA (mRNA) or protein sequences to eukaryotic genomes. It plays a critical role in gene annotation and the study of gene functions. Accurate spliced alignment demands sophisticated modeling of splice sites, but current aligners use simple models, which may affect their accuracy given dissimilar sequences. Results: We implemented minisplice to learn splice signals with a one-dimensional convolutional neural network (1D-CNN) and trained a model with 7,026 parameters for vertebrate and insect genomes. It captures conserved splice signals across phyla and reveals GC-rich introns specific to mammals and birds. We used this model to estimate the empirical splicing probability for every GT and AG in genomes, and modified minimap2 and miniprot to leverage pre-computed splicing probability during alignment. Evaluation on human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
