Conservation assessment of human splice site annotation based on a 470-genome alignment
Ilia Minkin, Steven L Salzberg

TL;DR
This paper assesses the accuracy of human splice site annotations by analyzing their evolutionary conservation across 350+ species.
Contribution
A new method to identify well-supported splice sites using conservation patterns and a logistic regression classifier.
Findings
Splice sites in the MANE annotation are consistently conserved across over 350 species.
A logistic regression model distinguishes well-supported splice sites from random sequences.
Transcripts using well-supported splice sites are enriched in high-confidence, functionally relevant genes.
Abstract
Despite many improvements over the years, the annotation of the human genome remains imperfect. The use of evolutionarily conserved sequences provides a strategy for selecting a high-confidence subset of the annotation. Using the latest whole-genome alignment, we found that splice sites from protein-coding genes in the high-quality MANE annotation are consistently conserved across >350 species. We also studied splice sites from the RefSeq, GENCODE, and CHESS databases not present in MANE. In addition, we analyzed the completeness of the alignment with respect to the human genome annotations and described a method that would allow us to fix up to 60% of the missing alignments of the protein-coding exons. We trained a logistic regression classifier to distinguish between the conservation exhibited by sites from MANE versus sites chosen randomly from neutrally evolving sequences. We found…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · RNA Research and Splicing · RNA and protein synthesis mechanisms
