# Enhancing genomic prediction in Arabidopsis thaliana with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization

**Authors:** Qingfang Ba, Heng Zhou, Zheming Yuan, Zhijun Dai

PMC · DOI: 10.3389/fbinf.2025.1607119 · Frontiers in Bioinformatics · 2025-06-18

## TL;DR

This paper introduces a new genomic selection method for Arabidopsis that improves prediction accuracy by using gene ontology and optimized SNP selection.

## Contribution

The novel binGO-GS framework integrates gene ontology priors and combinatorial optimization to enhance genomic prediction.

## Key findings

- binGO-GS significantly improves prediction accuracy across nine quantitative traits in Arabidopsis.
- Selected markers show consistent patterns for similar traits, supporting a polygenic model.
- binGO-GS outperforms full marker sets and random selection in multiple regression models.

## Abstract

With the rapid development of high-density molecular marker chips and high-throughput sequencing technologies, genomic selection/prediction (GS/GP) has been widely applied in plant breeding. Arabidopsis thaliana, as a common model organism, provides important resources for dissecting genetic variation and evolutionary mechanisms of complex traits. Quantitative traits are typically influenced by multiple minor-effect genes, which are often functionally related and can be enriched within gene ontology (GO) pathways. However, optimizing marker subsets associated with these pathways to enhance GP performance remains challenging. In this study, we propose an improved GS framework called binGO-GS by integrating GO-based biological priors with a novel bin-based combinatorial SNP subset selection strategy. We evaluated the performance of binGO-GS on nine quantitative traits from two A. thaliana datasets, comprising nearly 1,000 samples and over 1.8 million SNPs. Compared with using either the full marker set or randomly selected markers with Genomic BLUP (GBLUP), binGO-GS achieved statistically significant improvements in prediction accuracy across all traits. Similar improvements were observed across six additional regression models when applying binGO-GS instead of the full marker set. Furthermore, the selected markers for identical or similar morphological traits exhibited consistent patterns in quantity and genomic distribution, supporting the polygenic model of complex quantitative traits driven by minor-effect genes. Taken together, binGO-GS offers a powerful and interpretable approach to enhance GS performance, providing a methodological reference for accelerating plant breeding and germplasm innovation.

## Linked entities

- **Species:** Arabidopsis thaliana (taxon 3702)

## Full-text entities

- **Species:** Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12213587/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12213587/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12213587/full.md

---
Source: https://tomesphere.com/paper/PMC12213587