# Sensommatic: an efficient pipeline to mine and predict sensory receptor genes in the era of reference-quality genomes

**Authors:** Louise Ryan, Colleen Lawless, Graham M Hughes

PMC · DOI: 10.1093/bioinformatics/btae040 · 2024-01-23

## TL;DR

Sensommatic is a new pipeline that efficiently identifies sensory receptor genes in genomes, improving accuracy over traditional methods.

## Contribution

Sensommatic introduces an automated pipeline for sensory receptor gene annotation that is scalable and generalizable across species.

## Key findings

- Sensommatic uses BLAST and AUGUSTUS to accurately mine sensory receptor genes from genome assemblies.
- The pipeline addresses the underestimation of sensory receptors by conventional annotation tools.
- Sensommatic is adaptable for use in both vertebrate and non-vertebrate species with customized references.

## Abstract

Sensory receptor gene families have undergone extensive expansion and loss across vertebrate evolution, leading to significant variation in receptor counts between species. However, due to their species-specific nature, conventional reference-based annotation tools often underestimate the true number of sensory receptors in a given species. While there has been an exponential increase in the taxonomic diversity of publicly available genome assemblies in recent years, only ∼30% of vertebrate species on the NCBI database are currently annotated. To overcome these limitations, we developed ‘Sensommatic’, an automated and accessible sensory receptor annotation pipeline. Sensommatic implements BLAST and AUGUSTUS to mine and predict sensory receptor genes from whole genome assemblies, adopting a one-to-many gene mapping approach. While designed for vertebrates, Sensommatic can be extended to run on non-vertebrate species by generating customized reference files, making it a scalable and generalizable tool.

Source code and associated files are available at: https://github.com/GMHughes/Sensommatic

## Full-text entities

- **Genes:** VN1R17P (vomeronasal 1 receptor 17 pseudogene) [NCBI Gene 441931] {aka GPCR}, RHO (rhodopsin) [NCBI Gene 6010] {aka CSNBAD1, OPN2, RP4}
- **Species:** Pongo abelii (orang utan, species) [taxon 9601], Mus musculus (house mouse, species) [taxon 10090], Canis lupus familiaris (dog, subspecies) [taxon 9615], Anas platyrhynchos (duck, species) [taxon 8839], Ornithorhynchus anatinus (duck-billed platypus, species) [taxon 9258], Anolis carolinensis (Carolina anole, species) [taxon 28377], Xenopus laevis (African clawed frog, species) [taxon 8355], Microcaecilia unicolor (species) [taxon 1415580], Monodelphis domestica (gray short-tailed opossum, species) [taxon 13616], Homo sapiens (human, species) [taxon 9606], Loxodonta (African elephants, genus) [taxon 9784], Rattus norvegicus (brown rat, species) [taxon 10116], Bufo bufo (common European toad, species) [taxon 8384], Pan troglodytes (chimpanzee, species) [taxon 9598], Danio rerio (leopard danio, species) [taxon 7955], Equus caballus (domestic horse, species) [taxon 9796], Bos taurus (bovine, species) [taxon 9913]
- **Mutations:** T2T, start/stop
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC10832353/full.md

---
Source: https://tomesphere.com/paper/PMC10832353