# Genome-wide association mapping and haplotype analysis reveal genetic architecture of seed fatty acid compositions in 1,550 diverse soybean accessions

**Authors:** Ahmed M. Abdelghany, Shengrui Zhang, Jing Li, Bin Li, Lijuan Qiu, Junming Sun

PMC · DOI: 10.1186/s12870-025-07688-z · 2025-11-25

## TL;DR

This study identifies genetic factors influencing soybean seed fatty acid composition using genome-wide analysis of 1,550 soybean accessions.

## Contribution

The study provides a comprehensive genetic map of fatty acid traits and identifies key genes with region-specific haplotype patterns for breeding.

## Key findings

- 110,964 significant SNP-trait associations were identified across five fatty acids in soybean.
- Key genes like GmKCS21, GmFAD2, and GmFAD3 show distinct geographic haplotype patterns in Chinese soybean regions.
- Functional analysis highlights lipid metabolic pathways enriched in the identified genetic loci.

## Abstract

Understanding the genetic architecture of soybean seed fatty acid (FA) compositions to enhance oil quality is crucial for nutritional value and industrial applications. This study elucidates the genomic determinants of seed FA composition in soybean (Glycine max [L.] Merr.) through comprehensive genome-wide association study (GWAS) analysis utilizing 1,550 diverse soybean accessions evaluated across five distinct environmental conditions. The phenotypic evaluation revealed significant genetic variability and environmental influences on the biosynthetic process of five essential FAs: palmitic (PA), stearic (SA), oleic (OA), linoleic (LA), and linolenic acid (LNA). High-throughput genomic association mapping identified 110,964 significant SNP-trait associations encompassing 18,841 putative genes. Notable genetic loci included chromosome 5 and 17 harboring GmFATB1A and GmFATB1B for PA biosynthesis; chromosome 2 and 8 containing Glyma.02G161200 and Glyma.08G279700 associated with SA regulation; chromosomes 10, 13, and 20 with GmKCS21, GmKAS2, and GmFAD2 affecting OA concentration; chromosomes 10 and 13 with GmKCS21 and GmKAS2 influencing LA content; and chromosome 14 containing GmFAD3 controlling LNA biosynthesis. Functional annotation through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses revealed significant overrepresentation of lipid metabolic processes, particularly glycerolipid metabolic pathways. The haplotype characterization of three key regulatory genes GmKCS21, GmFAD2, and GmFAD3 revealed distinct geographic distribution patterns across the northern region, Huang-Huai-Hai region, and southern ecoregions of China, with varying allelic frequencies between improved cultivars and landraces, reflecting adaptive evolution and selection pressure during domestication and enhancement. This study provides a comprehensive genetic resource of 110,964 SNP-trait associations and functionally characterized haplotypes of key regulatory genes (GmKCS21, GmFAD2, and GmFAD3) that demonstrate ecoregion-specific allele frequency patterns, enabling marker-assisted selection strategies tailored to those soybean production ecoregions.

The online version contains supplementary material available at 10.1186/s12870-025-07688-z.

## Linked entities

- **Chemicals:** palmitic acid (PubChem CID 985), stearic acid (PubChem CID 5281), oleic acid (PubChem CID 445639), linoleic acid (PubChem CID 5280450), linolenic acid (PubChem CID 5280934)

## Full-text entities

- **Chemicals:** fatty acid (MESH:D005227)
- **Species:** Glycine max (soybean, species) [taxon 3847]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12648889/full.md

---
Source: https://tomesphere.com/paper/PMC12648889