# Conservation assessment of human splice site annotation based on a 470-genome alignment

**Authors:** Ilia Minkin, Steven L Salzberg

PMC · DOI: 10.1093/nar/gkaf184 · 2025-03-22

## TL;DR

This paper assesses the accuracy of human splice site annotations by analyzing their evolutionary conservation across 350+ species.

## Contribution

A new method to identify well-supported splice sites using conservation patterns and a logistic regression classifier.

## Key findings

- Splice sites in the MANE annotation are consistently conserved across over 350 species.
- A logistic regression model distinguishes well-supported splice sites from random sequences.
- Transcripts using well-supported splice sites are enriched in high-confidence, functionally relevant genes.

## Abstract

Despite many improvements over the years, the annotation of the human genome remains imperfect. The use of evolutionarily conserved sequences provides a strategy for selecting a high-confidence subset of the annotation. Using the latest whole-genome alignment, we found that splice sites from protein-coding genes in the high-quality MANE annotation are consistently conserved across >350 species. We also studied splice sites from the RefSeq, GENCODE, and CHESS databases not present in MANE. In addition, we analyzed the completeness of the alignment with respect to the human genome annotations and described a method that would allow us to fix up to 60% of the missing alignments of the protein-coding exons. We trained a logistic regression classifier to distinguish between the conservation exhibited by sites from MANE versus sites chosen randomly from neutrally evolving sequences. We found that splice sites classified by our model as well-supported have lower single nucleotide polymorphism rates and better transcriptomic evidence. We then computed a subset of transcripts using only “well-supported” splice sites or ones from MANE. This subset is enriched in high-confidence transcripts of the major gene catalogs that appear to be under purifying selection and are more likely to be correct and functionally relevant.

Graphical Abstract

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11928937/full.md

---
Source: https://tomesphere.com/paper/PMC11928937