# The Site/Group Extended Data Format and Tools

**Authors:** Julien Y Dutheil, Diyar Hamidi, Basile Pajot

PMC · DOI: 10.1093/gbe/evae011 · Genome Biology and Evolution · 2024-01-22

## TL;DR

This paper introduces a new data format and tools to simplify the analysis of gene evolution by combining statistical results with molecular structures.

## Contribution

The site/group extended data format and SgedTools provide a novel way to manage and analyze evolutionary data across different coordinate systems.

## Key findings

- The site/group extended data format allows for storing and manipulating groups of site annotations.
- SgedTools can translate coordinates between sequences, alignments, and 3D structures.
- A Monte-Carlo procedure in the package enables statistical testing of evolutionary hypotheses.

## Abstract

Comparative sequence analysis permits unraveling the molecular processes underlying gene evolution. Many statistical methods generate candidate positions within genes, such as fast or slowly evolving sites, coevolving groups of residues, sites undergoing positive selection, or changes in evolutionary rates. Understanding the functional causes of these evolutionary patterns requires combining the results of these analyses and mapping them onto molecular structures, a complex task involving distinct coordinate referential systems. To ease this task, we introduce the site/group extended data format, a simple text format to store (groups of) site annotations. We developed a toolset, the SgedTools, which permits site/group extended data file manipulation, creating them from various software outputs and translating coordinates between individual sequences, alignments, and three-dimensional structures. The package also includes a Monte-Carlo procedure to generate random site samples, possibly conditioning on site-specific features. This eases the statistical testing of evolutionary hypotheses, accounting for the structural properties of the encoded molecules.

## Full-text entities

- **Genes:** LYZ (lysozyme) [NCBI Gene 4069] {aka AMYLD5, LYZF1, LZM}
- **Chemicals:** carbons (MESH:D002244)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10849175/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10849175/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC10849175/full.md

---
Source: https://tomesphere.com/paper/PMC10849175