# A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

**Authors:** Osval A. Montesinos-López, Cristian Daniel Pulido-Carrillo, Abelardo Montesinos-López, Jesús Antonio Larios Trejo, José Cricelio Montesinos-López, Afolabi Agbona, José Crossa

PMC · DOI: 10.3390/genes15080969 · Genes · 2024-07-23

## TL;DR

This paper introduces a new method to improve genomic selection accuracy by reducing mismatches between training and testing data using weighted regression.

## Contribution

A novel weighted regression approach using binary-Lasso to reduce training-testing mismatches in genomic prediction.

## Key findings

- The method consistently improved normalized root mean square error across six datasets.
- Weighting β coefficients using Lasso, Ridge, and Elastic Net reduced the impact of mismatched features.
- The glmnet library enables easy implementation of the proposed weighting method.

## Abstract

Genomic selection (GS) is changing plant breeding by significantly reducing the resources needed for phenotyping. However, its accuracy can be compromised by mismatches between training and testing sets, which impact efficiency when the predictive model does not adequately reflect the genetic and environmental conditions of the target population. To address this challenge, this study introduces a straightforward method using binary-Lasso regression to estimate β coefficients. In this approach, the response variable assigns 1 to testing set inputs and 0 to training set inputs. Subsequently, Lasso, Ridge, and Elastic Net regression models use the inverse of these β coefficients (in absolute values) as weights during training (WLasso, WRidge, and WElastic Net). This weighting method gives less importance to features that discriminate more between training and testing sets. The effectiveness of this method is evaluated across six datasets, demonstrating consistent improvements in terms of the normalized root mean square error. Importantly, the model’s implementation is facilitated using the glmnet library, which supports straightforward integration for weighting β coefficients.

## Full-text entities

- **Diseases:** injury to people or property (MESH:C000719191), GDD (OMIM:166260), GS (MESH:D042822)
- **Chemicals:** oil (MESH:D009821), GS (-)
- **Species:** Glycine max (soybean, species) [taxon 3847], Solanum lycopersicum (tomato, species) [taxon 4081], Manihot esculenta (cassava, species) [taxon 3983], Solanum tuberosum (potatoes, species) [taxon 4113], Oryza sativa (Asian cultivated rice, species) [taxon 4530]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11353568/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11353568/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC11353568/full.md

---
Source: https://tomesphere.com/paper/PMC11353568