# Comparative gene annotation and orthology assignments across 301 species of Drosophilidae

**Authors:** Pankaj Dhakad, Bernard Y. Kim, Dmitri A. Petrov, Darren J. Obbard, Richard Hodge, Richard Hodge, Richard Hodge, Richard Hodge

PMC · DOI: 10.1371/journal.pbio.3003663 · PLOS Biology · 2026-02-18

## TL;DR

This study provides gene annotations for 301 Drosophilidae species and finds that codon usage is influenced by GC content and natural selection.

## Contribution

The study introduces a comprehensive comparative gene annotation dataset for Drosophilidae and reveals new insights into codon usage evolution.

## Key findings

- Gene number and CDS length show moderate phylogenetic heritability.
- Codon usage is correlated with GC content and shaped by selection.
- Species-specific factors, including assembly errors, significantly influence gene annotations.

## Abstract

High-quality genome annotations are essential if we are to address central questions in comparative genomics, such as the origin of new genes, the drivers of genome size variation, and the evolutionary forces shaping gene content and structure. Here, we present protein-coding gene annotations for 301 species of the family Drosophilidae, generated using the Comparative Annotation Toolkit (CAT) and BRAKER3, and incorporating available RNA-seq and protein evidence. We take a comparative phylogenetic approach to annotation, with the aim of improving consistency and accuracy, and to generate a robust set of gene annotations and orthology assignments. We analyze our annotations using a phylogenetic mixed-model approach and find that gene number and CDS length exhibit moderate phylogenetic heritability (40% and 9.7%, respectively). For comparison, we also present analyses using a subset of the 215 highest quality genomes, although the findings were not markedly different. Our work suggests that while evolutionary history contributes to variation in these traits, species-specific factors—including assembly error—play a substantial role in shaping observed differences. To illustrate the utility of our annotations for comparative analyses, we investigate codon usage bias and amino acid composition across Drosophilidae. We find that codon usage is correlated with overall GC content and evolves slowly, but that it is also strongly shaped by selection—such that, in general, species with the strongest selection on synonymous codon usage show the lowest GC bias in third codon positions. This comparative annotation dataset forms part of an ongoing collaborative project to sequence and annotate all species of Drosophilidae, with data and annotations being made rapidly and freely available on an ongoing basis. We hope that this effort will serve as a foundation for studies in evolutionary and functional genomics and comparative biology across Drosophilidae.

On-going community efforts aim to achieve a comprehensive genomic study of the entire family Drosophilidae. This study presents a comparative gene annotation for 301 species of Drosophilidae and find that codon usage correlates with overall GC content and evolves slowly, but is also strongly shaped by selection.

## Linked entities

- **Species:** Drosophilidae (taxon 7214)

## Full-text entities

- **Diseases:** HPD (MESH:D001851), GLMM (MESH:D004195), HOGs (MESH:D003057)
- **Chemicals:** Poly-A (MESH:D011061), CAT (-), S (MESH:D013455), Amino acid (MESH:D000596), N (MESH:D009584), C (MESH:D002244)
- **Species:** Drosophila rhopaloa (species) [taxon 1041015], Drosophila vulcana (species) [taxon 132243], Musca domestica (house fly, species) [taxon 7370], Drosophila guttifera (species) [taxon 66368], Drosophila pseudoobscura bogotana (subspecies) [taxon 46244], Drosophila suboccidentalis (species) [taxon 198723], Drosophilidae (pomace flies, family) [taxon 7214], Drosophila melanogaster (fruit fly, species) [taxon 7227], Drosophila pseudoobscura (species) [taxon 7237], Cercopithecidae (monkey, family) [taxon 9527], Drosophila fuyamai (species) [taxon 65963], Drosophila americana (species) [taxon 40366], Drosophila bifasciata (species) [taxon 7218], Daphnia pulex (common water flea, species) [taxon 6669], Melanogaster (genus) [taxon 80614], Drosophila punjabiensis (species) [taxon 60717], Drosophila subquinaria (species) [taxon 198725], Drosophila nannoptera (species) [taxon 103845], Drosophila quasianomalipes (species) [taxon 46834], Leucophenga varia (species) [taxon 745178], Drosophila kurseongensis (species) [taxon 1395515], Drosophila takahashii (species) [taxon 29030], Drosophila innubila (species) [taxon 198719], Drosophila ironensis (species) [taxon 2848634], Drosophila neohypocausta (species) [taxon 157057], Drosophila setifemur (species) [taxon 2848635], Drosophila differens (species) [taxon 7219], Drosophila miranda (species) [taxon 7229]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12928591/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12928591/full.md

## References

91 references — full list in the complete paper: https://tomesphere.com/paper/PMC12928591/full.md

---
Source: https://tomesphere.com/paper/PMC12928591