# Integrating omics and functional data via representation learning to prioritize candidate genes for pleiotropic effect in dairy sheep

**Authors:** Pablo Augusto de Souza Fonseca, Aroa Suárez-Vega, Laura Casas, Hector Marina, Beatriz Gutiérrez-Gil, Juan Jose Arranz

PMC · DOI: 10.1093/pnasnexus/pgaf361 · 2025-11-13

## TL;DR

This paper uses machine learning and multi-omics data to identify genes that influence multiple traits in dairy sheep, such as milk production and health.

## Contribution

A novel network-based machine learning approach is introduced to prioritize genes with pleiotropic effects using gene co-expression and functional annotations.

## Key findings

- 14 and 111 genes were identified as significant for Trait_GWAS and EBV_GWAS datasets, respectively.
- Three shared genes (PHGDH, SLC1A4, and CSN3) showed pleiotropic effects across datasets.
- Prioritized genes are linked to biological processes like amino acid transport, lipid metabolism, and immune regulation.

## Abstract

The global demand for improved productivity, sustainability, welfare, and quality in livestock production presents significant challenges for breeders. Understanding trait correlations, often driven by pleiotropy, is essential for simultaneously improving traits of economic interest. Integrating multi-omics data and functional annotations can improve the disentangling of biological processes underlying the pleiotropic effect. Network-based machine learning (ML) models offer a robust solution for this integration. This study estimated gene-level P-values for pleiotropic effects using two phenotypic datasets: (i) Trait_GWAS, with phenotypic values of 12 traits covering milk production, composition, cheeseability, and mastitis resistance; and (ii) EBV_GWAS, with estimated breeding values for five similar traits, excluding cheeseability. Weighted gene co-expression networks (WGCNs) were constructed from milk somatic cell transcriptomics of Assaf ewes. Gene-term networks were built from gene ontology, metabolic pathways, and quantitative trait loci annotation for the genes in the WGCN. These networks were processed through a representative learning pipeline to create a latent vector representing gene importance. A hierarchical model integrated gene-level P-values and the latent vector, generating posterior probabilities of association for each gene. Significant results included 14 and 111 genes for Trait_GWAS and EBV_GWAS, respectively, with three shared genes (PHGDH, SLC1A4, and CSN3). Prioritized genes were linked to biological processes such as amino acid transport, lipid metabolism, mammary gland development, and immune regulation, often involving multiple biological functions. This reinforces the potential pleiotropic role of these genes. These findings highlight the utility of network-based ML models for prioritizing candidate genes with pleiotropic effects on milk, cheese, and health-related traits in dairy sheep.

## Linked entities

- **Genes:** PHGDH (phosphoglycerate dehydrogenase) [NCBI Gene 26227], SLC1A4 (solute carrier family 1 member 4) [NCBI Gene 6509], CSN3 (casein kappa) [NCBI Gene 1448]

## Full-text entities

- **Genes:** SLC1A4 [NCBI Gene 101113480], CSN3 [NCBI Gene 443394], PHGDH [NCBI Gene 101111328]
- **Diseases:** mastitis (MESH:D008413)
- **Chemicals:** lipid (MESH:D008055), amino acid (MESH:D000596)
- **Species:** Ovis aries (domestic sheep, species) [taxon 9940]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12646080/full.md

---
Source: https://tomesphere.com/paper/PMC12646080