# Chimeric mis-annotations of genes remain pervasive in eukaryotic non-model organisms

**Authors:** Andreas Bachler, Thomas K. Walsh, Rahul V. Rane, Gunjan Pandey

PMC · DOI: 10.1186/s12864-025-11765-w · BMC Genomics · 2025-07-01

## TL;DR

This paper shows that incorrect gene annotations, where multiple genes are wrongly merged, are common in non-model organisms and can be detected using machine learning tools like Helixer.

## Contribution

The study reveals the widespread issue of chimeric gene mis-annotations and introduces machine learning as a solution to improve gene model accuracy.

## Key findings

- 605 confirmed chimeric mis-annotations were identified across 30 recently annotated genomes.
- Most errors were found in invertebrates and plants, not vertebrates.
- Machine-learning tools like Helixer can effectively detect and correct these mis-annotations.

## Abstract

Accurate annotation of protein-coding genes is critical for genome analysis in non-model organisms. However, limited RNA-Seq data and incomplete protein resources can lead to errors, including chimeric gene mis-annotations, where two or more adjacent genes are incorrectly fused into a single model. These errors often persist due to annotation inertia, where mistakes are propagated and amplified through data sharing and reanalysis, and leads to cases where the mis-annotated model becomes favoured over the correct model. This complicates almost all downstream genomic analyses such as gene expression studies and comparative genomics.

We investigated chimeric mis-annotations across 30 recently annotated genomes spanning invertebrates, vertebrates, and plants, identifying 605 confirmed cases. The majority of these errors occurred in invertebrates and plants. Using structural prediction and splicing assessment, we demonstrated that utilising machine-learning annotation tools (such as Helixer) provides an approach which can identify mis-annotations.

This study highlights the prevalence of chimeric mis-annotations in genomic datasets and showcases the potential of machine-learning tools such as Helixer to refine gene models for highly variable gene families with mis-annotations present in databases. By addressing these annotation errors, we improve genomic data reliability and facilitate a deeper understanding of non-model organisms.

The online version contains supplementary material available at 10.1186/s12864-025-11765-w.

## Full-text entities

- **Genes:** CYP6 AS7 [NCBI Gene 412936], LOC108003965 (uncharacterized LOC108003965) [NCBI Gene 108003965]
- **Chemicals:** iron (MESH:D007501)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Danio rerio (leopard danio, species) [taxon 7955], Strongylocentrotus purpuratus (purple sea urchin, species) [taxon 7668], Daucus carota (carrot, species) [taxon 4039], Drosophila melanogaster (fruit fly, species) [taxon 7227], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Spodoptera frugiperda (fall armyworm, species) [taxon 7108], Blattella germanica (German cockroach, species) [taxon 6973], Anser cygnoides (Chinese goose, species) [taxon 8845], Bythinella sp. GE (species) [taxon 989193], Aedes aegypti (yellow fever mosquito, species) [taxon 7159], Episyrphus balteatus (marmalade hoverfly, species) [taxon 286459], Apis mellifera (bee, species) [taxon 7460], Apis laboriosa (giant Himalayan honeybee, species) [taxon 183418], Bombyx mori (domestic silkworm, species) [taxon 7091], Tribolium castaneum (red flour beetle, species) [taxon 7070], Apis (genus) [taxon 7459], Apis dorsata (giant honeybee, species) [taxon 7462], Xenopus tropicalis (tropical clawed frog, species) [taxon 8364], Paracentrotus lividus (common sea urchin, species) [taxon 7656], Callorhinchus milii (Australian ghost shark, species) [taxon 7868], Homo sapiens (human, species) [taxon 9606], Drosophila grimshawi (species) [taxon 7222], Rattus rattus (black rat, species) [taxon 10117], Bombus terrestris (buff-tailed bumblebee, species) [taxon 30195], Bos taurus (bovine, species) [taxon 9913], Apis cerana (Asiatic honeybee, species) [taxon 7461], Drosophila ficusphila (species) [taxon 30025], Bactrocera tryoni (Queensland fruit fly, species) [taxon 59916], Gallus gallus (bantam, species) [taxon 9031]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12220653/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12220653/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12220653/full.md

---
Source: https://tomesphere.com/paper/PMC12220653