# Candidate orphan genes: Reassessing uniqueness

**Authors:** Sheetalpreet K. Maan, Xuan Y. Butzin, Steven X. Ge, Nicholas C. Butzin

PMC · DOI: 10.1371/journal.pone.0338891 · PLOS One · 2025-12-31

## TL;DR

This study shows that many genes once thought to be unique to certain bacteria are actually not orphans when analyzed with updated data.

## Contribution

The study introduces a refined dataset of bacterial 'candidate orphan genes' and proposes updated annotation practices to reflect their provisional status.

## Key findings

- Reanalysis of bacterial genes revealed an 81% decrease in orphan gene counts due to expanded genomic data.
- Many previously labeled orphan genes now have homologs in other bacterial taxa.
- The study recommends annotating such genes as 'candidate' to reflect their uncertain status.

## Abstract

Orphan genes lack recognizable homologues outside a given taxonomic unit; thus, they have uncertain evolutionary origins. This presents a profound challenge to traditional models of gene evolution. Their presence has fueled ongoing debate, and they have long been implicated in driving lineage-specific traits in medicine and evolutionary biology. These genes are often linked to species-specific traits and pathogenic mechanisms, including virulence and environmental adaptation, and their study provides critical insights into the origins and evolution of novel genes. Intrigued by their enigmatic nature, we re-analyzed a comprehensive 2023 dataset of orphan genes compiled from over 80,000 bacterial species. Using homology-based analyses, we reassessed the taxonomic distribution of each gene across a broader genomic landscape. Many “orphan genes” identified in 2023 now align with homologs in other bacterial taxa (as of 2025), demonstrating that limited database sampling had previously inflated the number of genes. This reassessment revealed an approximately 81% decrease in the number of orphan genes within just two years. These results challenged the long-held view that bacterial species truly harbor large numbers of orphan genes, instead demonstrating that their prevalence has been overestimated. To better reflect these findings, we propose that orphan genes be annotated using descriptors such as ‘candidate’ or ‘putative’, which more accurately convey the provisional and potentially temporary nature of their apparent uniqueness. Although our analysis greatly reduced false-positive classifications, it cannot determine whether a given candidate truly encodes a functional gene or is an artifact of bioinformatic analysis. To prioritize the most promising targets for biochemical or genetic validation, we applied additional computational filters and identified a subset of candidates most likely to encode bona fide proteins. This study redefines current understanding of orphan gene prevalence, establishes that such genes should be annotated with descriptors such as candidate or putative, much like we label “candidate bacterial species,” and provides a refined, high-confidence dataset for future in vitro and in vivo investigations.

## Full-text entities

- **Diseases:** cancers (MESH:D009369), TB (MESH:D014376)
- **Chemicals:** BioCyc (-)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Klebsiella pneumoniae (species) [taxon 573], Homo sapiens (human, species) [taxon 9606], Pseudomonas aeruginosa (species) [taxon 287], Mycobacterium tuberculosis (species) [taxon 1773], Legionella pneumophila (species) [taxon 446], Helicobacter pylori (species) [taxon 210], Bacillus subtilis (species) [taxon 1423], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Enterococcus faecalis (species) [taxon 1351], Salmonella enterica (species) [taxon 28901], Pan troglodytes (chimpanzee, species) [taxon 9598], Chlamydia trachomatis (species) [taxon 813]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12755737/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12755737/full.md

## References

71 references — full list in the complete paper: https://tomesphere.com/paper/PMC12755737/full.md

---
Source: https://tomesphere.com/paper/PMC12755737