# Fast phenotype simulation for genotype representation graphs

**Authors:** Aditya Syam, Chris Adonizio, Xinzhu Wei

PMC · DOI: 10.1093/bioadv/vbag040 · Bioinformatics Advances · 2026-02-06

## TL;DR

This paper introduces GrgPhenoSim, a fast tool for simulating phenotypes on genotype graphs, enabling efficient analysis of large-scale genomic data.

## Contribution

GrgPhenoSim is a novel, high-speed phenotype simulator for genotype representation graphs, outperforming existing tools on large datasets.

## Key findings

- GrgPhenoSim is dozens to hundreds of times faster than tstrait for phenotype simulation on large sample sizes.
- The tool supports standardized output and customizable simulations for statistical genetics applications.
- GrgPhenoSim enables efficient genome-wide association studies on biobank-scale data.

## Abstract

The Genotype Representation Graph (GRG) is a graph representation of whole genome polymorphisms, designed to encode the variant hard-call information in phased whole genomes. It encodes the genotypes as an extremely compact graph that can be traversed efficiently, enabling dynamic programming-style algorithms on applications such as genome-wide association studies that run faster on biobank-scale data than existing alternatives. To facilitate scalable statistical genetics, we present GrgPhenoSim, an extremely fast phenotype simulator for GRGs, suitable for simulating phenotypes on biobank-scale datasets.

GrgPhenoSim contains all the primary functionalities of a phenotype simulator, uses a standardized output, and supports customized simulations. GrgPhenoSim is dozens to hundreds of times faster than tstrait, a fast ancestral recombination graph-based phenotype simulator, when the sample size ranges from thousands to hundreds of thousands of samples.

The GrgPhenoSim library and use-case demonstrations are available at https://github.com/aprilweilab/grg_pheno_sim. The documentation for GrgPhenoSim is hosted at https://grgl.readthedocs.io/en/stable/examples_and_applications.html#phenotype-simulation.

## Full-text entities

- **Genes:** TLE5 (TLE family member 5, transcriptional modulator) [NCBI Gene 166] {aka AES, AES-1, AES-2, ESP1, GRG, GRG5}, ABL2 (ABL proto-oncogene 2, non-receptor tyrosine kinase) [NCBI Gene 27] {aka ABLL, ARG}
- **Diseases:** TS (MESH:D021184), type II diabetes (MESH:D003924), IGD (MESH:C566784)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** 2 N-by-M

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12927419/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12927419/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12927419/full.md

---
Source: https://tomesphere.com/paper/PMC12927419