# A simple and effective method for simulating nested exchangeable correlated binary data for longitudinal cluster randomised trials

**Authors:** Rhys A. Bowden, Jessica Kasza, Andrew B. Forbes

PMC · DOI: 10.1186/s12874-024-02285-4 · BMC Medical Research Methodology · 2024-08-08

## TL;DR

This paper introduces a fast and flexible method for simulating binary data with a nested correlation structure, useful for planning and analyzing longitudinal cluster randomized trials.

## Contribution

The first practical method for simulating large binary datasets with nested exchangeable correlation structures.

## Key findings

- The proposed method is significantly faster than existing general simulation methods.
- It allows for a much wider range of correlations compared to alternative approaches.
- The method is demonstrated using parameters from a real cluster randomized crossover trial.

## Abstract

Simulation is an important tool for assessing the performance of statistical methods for the analysis of data and for the planning of studies. While methods are available for the simulation of correlated binary random variables, all have significant practical limitations for simulating outcomes from longitudinal cluster randomised trial designs, such as the cluster randomised crossover and the stepped wedge trial designs. For these trial designs as the number of observations in each cluster increases these methods either become computationally infeasible or their range of allowable correlations rapidly shrinks to zero.

In this paper we present a simple method for simulating binary random variables with a specified vector of prevalences and correlation matrix. This method allows for the outcome prevalence to change due to treatment or over time, and for a ‘nested exchangeable’ correlation structure, in which observations in the same cluster are more highly correlated if they are measured in the same time period than in different time periods, and where different individuals are measured in each time period. This means that our method is also applicable to more general hierarchical clustered data contexts, such as students within classrooms within schools. The method is demonstrated by simulating 1000 datasets with parameters matching those derived from data from a cluster randomised crossover trial assessing two variants of stress ulcer prophylaxis.

Our method is orders of magnitude faster than the most well known general simulation method while also allowing a much wider range of correlations than alternative methods. An implementation of our method is available in an R package NestBin.

This simulation method is the first to allow for practical and efficient simulation of large datasets of binary outcomes with the commonly used nested exchangeable correlation structure. This will allow for much more effective testing of designs and inference methods for longitudinal cluster randomised trials with binary outcomes.

The online version contains supplementary material available at 10.1186/s12874-024-02285-4.

## Full-text entities

- **Diseases:** stress ulcer (MESH:D000079225)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11308151/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11308151/full.md

## References

1 references — full list in the complete paper: https://tomesphere.com/paper/PMC11308151/full.md

---
Source: https://tomesphere.com/paper/PMC11308151