# GWarrange: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events

**Authors:** Yi Ling Tam, Sarah Cameron, Andrew Preston, Lauren Cowley

PMC · DOI: 10.1099/mgen.0.001268 · 2024-07-09

## TL;DR

GWarrange is a pipeline that uses k-mers to detect genome rearrangements linked to phenotypes in bacterial genome-wide association studies.

## Contribution

GWarrange introduces a novel method to interpret k-mers in the context of genome rearrangements using placeholder sequences for repeats.

## Key findings

- GWarrange successfully identifies genome rearrangements associated with phenotypes in bacterial species.
- The pipeline replaces repeat sequences with placeholders to improve k-mer analysis of rearrangement boundaries.
- Case studies on Bordetella pertussis and Enterococcus faecium demonstrate the pipeline's effectiveness.

## Abstract

The use of k-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret k-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present GWarrange, a pre- and post-bGWAS processing methodology that leverages the unique properties of k-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short k-mers. Then, locations of flanking regions in significant k-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (Bordetella pertussis and Enterococcus faecium) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. GWarrange is available at https://github.com/DorothyTamYiLing/GWarrangehttps://github.com/DorothyTamYiLing/genome_rearrangement.git.

## Linked entities

- **Species:** Bordetella pertussis (taxon 520), Enterococcus faecium (taxon 1352)

## Full-text entities

- **Species:** Bordetella pertussis (species) [taxon 520], Enterococcus faecium (species) [taxon 1352]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11316554/full.md

---
Source: https://tomesphere.com/paper/PMC11316554