# Obscured-ensemble models for genomic prediction

**Authors:** Rounak Saha, Amir Morshedian, Jia Sun, Robert Duncan, Mike Domaratzki

PMC · DOI: 10.1371/journal.pone.0334239 · 2025-11-14

## TL;DR

This paper introduces an obscured-ensemble model for genomic prediction that improves efficiency without sacrificing accuracy.

## Contribution

The novel obscured-ensemble model uses selective feature subsets and genotype similarity for efficient genomic prediction.

## Key findings

- Genomic prediction can be achieved using only 20% of obscured markers per genotype without accuracy loss.
- The obscured ensemble model performs well even with limited genotype data and random subset selection.
- The model avoids shortcut learning by not relying on genomic linkage.

## Abstract

Genomic Prediction (GP) uses dense whole-genome marker sets from lines of a crop to predict agronomic traits for untested genotypes. In recent years, deep learning (DL) approaches for genomic prediction have demonstrated state-of-the-art results. However, substantial variation exists in DL outcomes for GP as the success of DL is dependent on the architecture of the model used, as well as the amount of data available and the population structure of the individuals in the training set. In this paper, we consider an obscured model for GP, where the model is not provided with genomic content. The obscured model was intended to evaluate the possibility of so-called shortcut learning in GP.We conclude that we can perform GP using the obscured model with only 20% of the obscured markers from each reference genotype. This selective feature usage significantly enhances the efficiency of our model without compromising accuracy. By eliminating markers, we demonstrate that the model is not relying on linkage to perform shortcut learning. Further, we consider a deep learning ensemble method for genomic prediction based on the obscured model. The ensemble model we develop here shows success as a method for GP by using the similarity to each of the elements of a training set of genotypes, as well as the performance of the genotypes. We evaluate the obscured ensemble model for GP. We demonstrate that the obscured ensemble model is successful even with a limited number of genotypes used for prediction. Further, random selection of a subset of genotypes is sufficient to ensure successful performance.

## Full-text entities

- **Diseases:** DL (MESH:D007859), GP (MESH:D042822), PCC (MESH:C536353)
- **Chemicals:** oil (MESH:D009821), canola (MESH:D000074262), GP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Brassica napus (oilseed rape, species) [taxon 3708], Bos taurus (bovine, species) [taxon 9913], Brassica napus var. napus (annual rape, varietas) [taxon 138011]

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12617858/full.md

---
Source: https://tomesphere.com/paper/PMC12617858