# Disentangling direct and pleiotropic SNP effects in alfalfa (Medicago sativa L.) using causal graph learning

**Authors:** Yangming Lee, Cesar A. Medina, Zhanyou Xu

PMC · DOI: 10.1038/s41598-026-35876-w · Scientific Reports · 2026-01-14

## TL;DR

This paper introduces a causal graph learning framework to distinguish direct and indirect genetic effects in alfalfa, improving genome-wide association study (GWAS) interpretation.

## Contribution

A novel causal graph-based framework that integrates de-confounded feature screening and structural learning to prioritize direct genetic effectors in polyploid crops.

## Key findings

- Directed acyclic graphs distinguish direct parent SNPs from upstream hub SNPs in alfalfa traits.
- Direct parent SNPs outperform upstream hubs in predictive accuracy for stem-related traits.
- Causal graph learning provides interpretable networks and improves GWAS marker prioritization in polyploid crops.

## Abstract

Alfalfa (Medicago sativa L.) is a critical forage crop whose improvement depends on resolving the complex genetic architecture of agronomic traits. While genome-wide association studies (GWAS) effectively identify statistically associated markers, they often fail to distinguish direct genetic effectors from indirect or pleiotropic signals arising from linkage disequilibrium and population structure. Here, we present a causal graph based genomic discovery framework that integrates de-confounded feature screening with causal graph learning to infer directional dependency structures from observational genomic data. Using Double Machine Learning to control for confounding and the PC algorithm for structural learning, we construct directed acyclic graphs that distinguish Direct Parent SNPs (DPSs), representing local effectors within the Markov Blanket of a trait, from Upstream Hub SNPs (UHSs), representing pleiotropic regulators with broad network connectivity. Applied to four stem-related traits in alfalfa, the framework reduces genome-wide associations to compact, interpretable causal-consistent networks. Predictive validation demonstrates that DPSs consistently outperform both upstream UHSs and random controls, confirming their role as precise trait-specific biomarkers, while UHSs exhibit limited direct predictive power consistent with signal dilution along causal pathways. Together, these results demonstrate that causal graph learning can act as a biologically grounded regularizer for GWAS in polyploid crops, enabling principled marker prioritization and providing a structural foundation for future multi-omics integration.

## Full-text entities

- **Diseases:** pigmentation (MESH:D010859), DPSs (MESH:D063129), DoubleML (MESH:D007859), Injury (MESH:D014947), death (MESH:D003643), Winter Injury (MESH:D016574)
- **Chemicals:** lignin (MESH:D008031), anthocyanin (MESH:D000872), nitrogen (MESH:D009584), sporopollenin (MESH:C009800)
- **Species:** Helianthus annuus (common sunflower, species) [taxon 4232], Medicago sativa (alfalfa, species) [taxon 3879]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12881410/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12881410/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12881410/full.md

---
Source: https://tomesphere.com/paper/PMC12881410