An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding
Renqi Chen, Wenwei Han, Haohao Zhang, Haoyang Su, Zhefan Wang, Xiaolei, Liu, Hao Jiang, Wanli Ouyang, and Nanqing Dong

TL;DR
This paper introduces a straightforward Transformer-based framework with simple techniques like k-mer tokenization and random masking, significantly improving genomic selection performance in crop breeding over traditional methods.
Contribution
It presents a novel, easy-to-implement Transformer approach that effectively captures non-linear relationships in genomic data for crop breeding.
Findings
Transformer outperforms traditional statistical methods in GS tasks
Simple tricks like k-mer tokenization enhance model performance
The approach is robust on rice3k and wheat3k datasets
Abstract
Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis. The predominant approaches in GS currently revolve around employing statistical methods for prediction. However, statistical methods often come with two main limitations: strong statistical priors and linear assumptions. A recent trend is to capture the non-linear relationships between markers by deep learning. However, as crop datasets are commonly long sequences with limited samples, the robustness of deep learning models, especially Transformers, remains a challenge. In this work, to unleash the unexplored potential of attention mechanism for the task of interest, we propose a simple yet effective Transformer-based framework that enables end-to-end training of the whole sequence. Via experiments on rice3k and wheat3k datasets, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Mapping and Diversity in Plants and Animals · Genetics and Plant Breeding · Genetically Modified Organisms Research
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout · Softmax
