# Selecting variant masks to improve power and replicability of gene-level burden tests

**Authors:** Trang Nguyen, Ryan Koesterer, Sean J. Jurgens, Peter Dornbos, Satoshi Yoshiji, Alex Llamas, Dongkeun Jang, Patrick Smadbeck, Annie Moriondo, Quy Hoang, Oliver Ruebenacker, Patrick Ellinor, Noël Burtt, Jason Flannick

PMC · DOI: 10.21203/rs.3.rs-6322956/v1 · Research Square · 2025-04-15

## TL;DR

This paper shows how different strategies for selecting genetic variants can greatly affect the results of gene-level association studies, and proposes better strategies to improve consistency and power.

## Contribution

The study identifies optimal variant masking strategies that double the number of significant associations compared to previous methods.

## Key findings

- The number of significant associations varies widely depending on the masking strategy used.
- Optimized masking strategies detect twice as many significant associations as average strategies.
- Published analyses using the same dataset report minimally overlapping results due to inconsistent masking.

## Abstract

Rare coding variant association studies typically perform gene-level association tests in which variants are filtered (or “masked”) and aggregated based on functional annotation and allele frequency. As there is little research and no consensus regarding masking strategies to use, we investigated the impact of masking strategies on gene-level burden tests, the most widely used and interpretable type of aggregate association test. A systematic review of 234 studies catalogued 664 masks and masking strategies that rarely repeated across studies. Analyzing 54 traits within 189,947 UK Biobank exomes, we show that the number of significant associations greatly depends on the masking strategy employed (ranging from 58 to 2,523 associations) and, consequently, separate published analyses of this dataset report minimally overlapping associations (<30%). By empirically determining mask combinations that maximize the number of significant associations, we propose masking strategies that detect twice as many significant low-frequency and rare variant associations as the “average” strategies previously employed, with consistent performance across many traits. Our analyses demonstrate the inconsistency of previously used variant masking strategies and provide a simple solution to increase power and replicability in future studies.

## Full-text entities

- **Genes:** PLAG1 (PLAG1 zinc finger) [NCBI Gene 5324] {aka PSA, SGPA, SRS4, ZNF912}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}, PTPN11 (protein tyrosine phosphatase non-receptor type 11) [NCBI Gene 5781] {aka BPTP3, CFC, JMML, METCDS, NS1, PTP-1D}, PRG2 (proteoglycan 2, pro eosinophil major basic protein) [NCBI Gene 5553] {aka BMPG, MBP, MBP1, proMBP}, ptpn6 (protein tyrosine phosphatase non-receptor type 6) [NCBI Gene 335573] {aka Hcph, SHP1, fj22f05, wu:fj22f05, zgc:55501}, Asxl1 (ASXL transcriptional regulator 1) [NCBI Gene 228790]
- **Diseases:** type 2 diabetes (MESH:D003924), obesity (MESH:D009765), insulin resistance (MESH:D007333), myeloid diseases (MESH:D007951), BOS (MESH:D013577), inflammatory (MESH:D007249), colorectal cancer (MESH:D015179), PC (MESH:D015324)
- **Species:** Danio rerio (leopard danio, species) [taxon 7955], Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]
- **Mutations:** rs639509, A1C, rs548854, rs1029850317, rs11066309, rs62515408, rs72656010

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12047983/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12047983/full.md

## References

77 references — full list in the complete paper: https://tomesphere.com/paper/PMC12047983/full.md

---
Source: https://tomesphere.com/paper/PMC12047983