# Identification of universal grass genes and estimates of their monocot-/commelinid-/grass-specificity

**Authors:** Rowan A C Mitchell

PMC · DOI: 10.1093/bioadv/vbaf079 · Bioinformatics Advances · 2025-04-07

## TL;DR

This paper identifies genes common to all grasses and classifies them based on their likely function in grasses, monocots, or commelinids.

## Contribution

A novel pipeline using HMM profiles identifies 13,312 universal grass gene groups and classifies their specificity.

## Key findings

- 13,312 universal grass gene groups were identified using 16 grass genomes.
- 4,609 gene groups were classified as monocot-/commelinid-/grass-specific.
- HMM-based classification outperformed percentage identity in gene specificity analysis.

## Abstract

Where experiments identify sets of grass genes of unknown function, e.g. underlying a QTL or co-expressed in a transcriptome, it is useful to know which of these genes are common to all grasses (universal) and whether they likely have monocot-/commelinid-/grass-specific function.

A pipeline used data on 16 grass full genomes from Ensembl Plants to generate 13 312 highly conserved, universal groups of grass protein-coding genes. Validation steps showed that 98.8% of these groups also had gene matches in recently sequenced genomes from two major grass clades not used in the pipeline. Comparison with many non-grass genomes identified 4609 of these groups as likely of monocot-/commelinid-/grass-specific function. Both grouping of genes and specificity were defined using hidden Markov model (HMM) profiles of the groups. The HMM-based approach performed better than simple percentage identity in discriminating between test sets of known specific and non-specific genes. The results give novel insight into the nature of monocot-/commelinid-/grass-specific genes. Researchers can use the universal_grass_peps database to gain evidence for their experimentally identified grass genes being involved in monocot-/commelinid-/grass-specific traits.

The universal_grass_peps database is available for download at https://data.rothamsted.ac.uk/dataset/universal_grass_peps.

## Full-text entities

- **Chemicals:** lignin (MESH:D008031), silica (MESH:D012822), CBH (-), Si (MESH:D012825), xylan (MESH:D014990), dietary fibre (MESH:D004043), amino acid (MESH:D000596), S (MESH:D013455), AX (MESH:C085118), nucleotide (MESH:D009711)
- **Species:** Panicum hallii (species) [taxon 206008], Homo sapiens (human, species) [taxon 9606], Zea mays (maize, species) [taxon 4577], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Bambuseae (bamboo, tribe) [taxon 147376], Setaria italica (foxtail millet, species) [taxon 4555], Hordeum vulgare (barley, species) [taxon 4513], Phragmites australis (common reed, species) [taxon 29695], Setaria viridis (species) [taxon 4556], Saccharum spontaneum (fodder cane, species) [taxon 62335], Eragrostis curvula (Boer love grass, species) [taxon 38414], Phyllostachys violascens (species) [taxon 1903417], commelinids (clade) [taxon 4734], Oryza rufipogon (brownbeard rice, species) [taxon 4529], Panicum miliaceum (broomcorn millet, species) [taxon 4540], Secale cereale (rye, species) [taxon 4550], P. australis [taxon 425650], Lolium perenne (perennial ryegrass, species) [taxon 4522], Lolium (genus) [taxon 4520], Brachypodium distachyon (annual false brome, species) [taxon 15368], Sorghum bicolor (broomcorn, species) [taxon 4558], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Echinochloa crus-galli (barnyard grass, species) [taxon 90397], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12098945/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12098945/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12098945/full.md

---
Source: https://tomesphere.com/paper/PMC12098945