# Evaluating Genome Assemblies for Optimized Completeness and Accuracy of Reference Gene Sequences in Wheat, Rye, and Triticale

**Authors:** Mingke Yan, Guodong Yang, Dongming Yang, Xin Zhang, Quanzhen Wang, Jinghui Gao, Chugang Mei

PMC · DOI: 10.3390/plants14071140 · 2025-04-06

## TL;DR

This paper evaluates genome assemblies for wheat, rye, and triticale to find the most accurate and complete references for gene studies.

## Contribution

The study identifies optimal genome assemblies and introduces a new indicator for assembly accuracy based on internal stop codons.

## Key findings

- SY Mattis, Lo7, and SY Mattis plus Lo7 are the most robust genome assemblies for wheat, rye, and triticale.
- Internal stop codons negatively correlate with assembly accuracy and RNA-seq mappability in wheat.
- Incorporating the D genome improves bioinformatic analyses for triticale.

## Abstract

Recent years have witnessed a surge in the publication of dozens of genome assemblies for Triticeae crops, which have significantly advanced gene-related research in wheat, rye, and triticale. However, this progress has also introduced challenges in selecting universally efficient and applicable reference genomes for genotypes with distant or ambiguous phylogenetic relationships. In this study, we assessed the completeness and accuracy of genome assemblies for wheat, rye, and triticale using comparative benchmarking universal single-copy orthologue (BUSCO) analysis and transcript mapping approaches. BUSCO analysis revealed that the proportion of complete genes positively correlated with RNA-seq read mappability, while the frequency of internal stop codons served as a significant negative indicator of assembly accuracy and RNA-seq data mappability in wheat. By integrated analysis of alignment rate, covered length, and total depth from RNA-seq data, we identified the assemblies of SY Mattis, Lo7, and SY Mattis plus Lo7 as the most robust references for gene-related studies in wheat, rye, and triticale, respectively. Furthermore, we recommend that the D genome sequence be incorporated in reference assemblies in bioinformatic analyses for triticale, as introgression, translocation, and substitution of the D genome into triticale genome frequently occurs during triticale breeding. The frequency of internal stop codons could help in evaluating correctness of assemblies published in the future, and other findings are expected to support gene-related research in wheat, rye, triticale, and other closely related species.

## Full-text entities

- **Species:** x Triticosecale (triticale, genus) [taxon 49317]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11991537/full.md

---
Source: https://tomesphere.com/paper/PMC11991537