# VarLand: A pipeline to map the structural landscape of missense variants at the proteome scale

**Authors:** Francisco J. Guzmán-Vega, Kelly J. Cardona-Londoño, Ana C. González-Álvarez, Karla A. Peña-Guerra, Azza Althagafi, Tanisha Khan, Robert Hoehndorf, Stefan T. Arold

PMC · DOI: 10.1016/j.jbc.2025.111071 · The Journal of Biological Chemistry · 2025-12-17

## TL;DR

VarLand is a pipeline that uses protein structure predictions to analyze missense variants and their impact on disease mechanisms.

## Contribution

VarLand introduces a multidimensional structural profiling approach to assess missense variant pathogenicity at the proteome scale.

## Key findings

- Pathogenic variants are enriched in ordered, buried regions with high contact density, while benign variants are in disordered, solvent-exposed regions.
- VarLand reveals variations in structural features across protein functional classes and disease categories.
- AlphaMissense variants show stronger structure-pathogenicity associations than clinical datasets, indicating training biases.

## Abstract

Missense variant pathogenicity often arises from disruptions to protein structural features. The integration of large-scale genetic sequencing into clinical workflows, and the availability of accurate artificial intelligence-based protein structure predictions present an opportunity to assess the structure–function relationship of missense variants at a population scale. To harness this potential, we developed VarLand, a computational pipeline that extracts 29 structural and biophysical features from AlphaFold-predicted protein models and nine complementary annotation tools. We applied VarLand to pathogenic missense variants from ClinVar and a population-specific dataset of rare Middle Eastern variants, comparing their feature profiles to high-frequency benign variants from the Genome Aggregation Database (gnomAD). Our analysis confirms that pathogenic variants are significantly enriched in ordered regions, buried residues, and sites with high intramolecular contact density, whereas benign variants preferentially occur in disordered, solvent-exposed regions. However, VarLand also uncovered feature landscape variations across protein functional classes and disease categories, suggesting differences in underlying disease mechanisms. Furthermore, variants from the artificial intelligence-based AlphaMissense database showed a stronger association between structural order and pathogenicity than clinical datasets, indicating residual bias from structure-centric training. These findings demonstrate the effectiveness of multidimensional structural profiling by VarLand to uncover not only broad structure–pathogenicity relationships but also dataset-specific and class-specific deviations, offering deeper insight into disease mechanisms.

## Full-text entities

- **Genes:** OR2L3 (olfactory receptor family 2 subfamily L member 3) [NCBI Gene 391192], STAT1 (signal transducer and activator of transcription 1) [NCBI Gene 6772] {aka CANDF7, IMD31A, IMD31B, IMD31C, ISGF-3, STAT91}, VWF (von Willebrand factor) [NCBI Gene 7450] {aka F8VWF, VWD}, MMRN1 (multimerin 1) [NCBI Gene 22915] {aka ECM, EMILIN4, GPIa*, MMRN}
- **Diseases:** von Willebrand disease type 1 (MESH:D056725), vascular injury (MESH:D057772), gnomAD (MESH:D042822), diseases (MESH:D004194), neurological (MESH:D009461), metabolic (MESH:D008659), genetic (MESH:D030342), pLDDT (MESH:D013736), structural (MESH:D020914), neurological diseases (MESH:D020271), musculoskeletal (MESH:D009140)
- **Chemicals:** proline (MESH:D011392), amino (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12816909/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12816909/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12816909/full.md

---
Source: https://tomesphere.com/paper/PMC12816909