# Quantifying annotation-driven bias in alternative splicing from EGAP metadata

**Authors:** Rebeca de la Fuente, Wladimiro Díaz-Villanueva, Vicente Arnau, Andrés Moya

PMC · DOI: 10.1093/nargab/lqaf141 · NAR Genomics and Bioinformatics · 2025-11-11

## TL;DR

This paper shows how gene annotations affect splicing estimates and introduces a method to correct for these biases in comparative genomics.

## Contribution

A normalization procedure using polynomial regression to adjust for annotation-driven bias in splicing estimates.

## Key findings

- Experimental evidence for CDSs is the main factor affecting splicing estimates.
- Assembly quality and transcriptomic input have minor effects on splicing estimates.
- The proposed normalization preserves splicing complexity while reducing annotation artifacts.

## Abstract

Annotated coding sequences (CDSs) enable genome-wide estimates of alternative splicing. However, the quality and evidence support of these annotations can systematically bias estimates of splicing events across species. Here, we evaluate how annotation-related variables from the NCBI Eukaryotic Genome Annotation Pipeline affect inferred splicing levels. Analyzing 670 multicellular eukaryotes, we find that the percentage of CDSs supported by experimental evidence is the dominant predictor of variation in splicing estimates, whereas assembly quality and raw transcriptomic input play a minor role. To correct this annotation-driven bias, we introduce a normalization procedure based on polynomial regression, yielding an adjusted metric of alternative splicing. This novel metric preserves relative splicing complexity across species while mitigating annotation artifacts, with important implications for comparative genomics.

## Full-text entities

- **Genes:** NAA35 (N-alpha-acetyltransferase 35, NatC auxiliary subunit) [NCBI Gene 60560] {aka EGAP, MAK10, MAK10P, bA379P1.1}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12605758/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12605758/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12605758/full.md

---
Source: https://tomesphere.com/paper/PMC12605758