# In silico identification of archaeal DNA-binding proteins

**Authors:** Linus Donvil, Joëlle A J Housmans, Eveline Peeters, Wim Vranken, Gabriele Orlando

PMC · DOI: 10.1093/bioinformatics/btaf169 · Bioinformatics · 2025-05-02

## TL;DR

This paper introduces Xenusia, a tool that identifies DNA-binding proteins in archaea using machine learning, helping to explore these under-researched organisms.

## Contribution

The novel contribution is Xenusia, a neural network-based tool for identifying archaeal DNA-binding proteins from diverse datasets.

## Key findings

- Xenusia successfully identifies DNA-binding proteins in archaea, including from metagenomics data.
- Predictions made by Xenusia have been experimentally validated.
- Xenusia is effective across diverse datasets and is publicly available for use.

## Abstract

The rapid advancement of next-generation sequencing technologies has generated an immense volume of genetic data. However, these data are unevenly distributed, with well-studied organisms being disproportionately represented, while other organisms, such as from archaea, remain significantly underexplored. The study of archaea is particularly challenging due to the extreme environments they inhabit and the difficulties associated with culturing them in the laboratory. Despite these challenges, archaea likely represent a crucial evolutionary link between eukaryotic and prokaryotic organisms, and their investigation could shed light on the early stages of life on Earth. Yet, a significant portion of archaeal proteins are annotated with limited or inaccurate information. Among the various classes of archaeal proteins, DNA-binding proteins are of particular importance. While they represent a large portion of every known proteome, their identification in archaea is complicated by the substantial evolutionary divergence between archaeal and the other better studied organisms.

To address the challenges of identifying DNA-binding proteins in archaea, we developed Xenusia, a neural network-based tool capable of screening entire archaeal proteomes to identify DNA-binding proteins. Xenusia has proven effective across diverse datasets, including metagenomics data, successfully identifying novel DNA-binding proteins, with experimental validation of its predictions.

Xenusia is available as a PyPI package, with source code accessible at https://github.com/grogdrinker/xenusia, and as a Google Colab web server application at xenusia.ipynb.

## Linked entities

- **Species:** Archaea (taxon 2157)

## Full-text entities

- **Genes:** alcohol dehydrogenase [NCBI Gene 13909458], mobA [NCBI Gene 13877150], DNA-binding protein [NCBI Gene 28379184]
- **Chemicals:** amino acids (MESH:D000596), P1 (MESH:C480041), imidazole (MESH:C029899), IPTG (MESH:D007544), His (MESH:D006639), SDS (MESH:D012967), acrylamide (MESH:D020106), L-isoleucine (MESH:D007532), NN1 (-), 32P (MESH:C000615311), NaCl (MESH:D012965)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Staphylococcus aureus (species) [taxon 1280], Danio rerio (leopard danio, species) [taxon 7955], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** DH5alpha — Drosophila hydei (Fruit fly), Spontaneously immortalized cell line (CVCL_Z531), BL21 E. coli — Homo sapiens (Human), EBV-related Burkitt lymphoma, Cancer cell line (CVCL_M639), E. coli — Mus musculus (Mouse), Hybridoma (CVCL_C5CR), ANR-23- — Homo sapiens (Human), Transformed cell line (CVCL_B2RM), Rosetta (DE3 — Mus musculus (Mouse), Hybridoma (CVCL_B7HM), -0061 — Homo sapiens (Human), Transformed cell line (CVCL_K336), S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232), pET24a — Mus musculus (Mouse), Hybridoma (CVCL_C5HY)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12065626/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12065626/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12065626/full.md

---
Source: https://tomesphere.com/paper/PMC12065626