# Advancing Yeast Identification Using High‐Throughput DNA Barcode Data From a Curated Culture Collection

**Authors:** Duong Vu, Michel de Vries, Bert Gerrits van den Ende, Jos Houbraken, R. Henrik Nilsson, Balázs Brankovics, Margarita Hernández‐Restrepo, Johannes Z. Groenewald, Pedro W. Crous, Ferry Hagen, Wieland Meyer, Gerard J. M. Verkley, Marizeth Groenewald

PMC · DOI: 10.1111/1755-0998.70082 · 2025-11-26

## TL;DR

This paper improves yeast identification by expanding a curated DNA barcode database, enhancing accuracy in environmental and clinical studies.

## Contribution

The study expands a high-quality yeast DNA barcode dataset and proposes marker-specific similarity cutoffs for improved metabarcoding accuracy.

## Key findings

- An expanded dataset of 2856 ITS and 3815 LSU sequences was generated, representing 911 and 1137 yeast species.
- Marker-specific similarity cutoffs for ITS, ITS1, ITS2, and LSU were proposed to improve taxonomic resolution.
- Reanalysis of Human Microbiome Project data showed diet and environment influence gut mycobiota.

## Abstract

Yeast identification is essential in fields ranging from microbiology and biotechnology to food science and medicine. While DNA barcoding has become the standard for identifying cultured strains, environmental DNA (eDNA) metabarcoding has revolutionised microbial community profiling, providing deeper insights into yeast communities across diverse ecosystems. A major challenge in DNA (meta)barcoding remains the limited availability of high‐quality reference sequences, which are critical for accurate species identification and comprehensive taxonomic profiling of both environmental and clinical samples. To address this gap, the Westerdijk Fungal Biodiversity Institute (WI) launched a DNA barcoding initiative in 2006 to generate high‐quality, often type‐derived ITS and LSU barcodes for all ~100,000 fungal strains preserved in the CBS culture collection, including approximately 15,000 yeasts. Building on the yeast barcode dataset released in 2016, we now present an expanded set of 2856 ITS and 3815 LSU sequences, representing 911 and 1137 yeast species, respectively. Notably, 27%–29% of these sequences are derived from ex‐type cultures. Using both newly generated and previously published barcodes, we assess the taxonomic resolution of commonly used yeast metabarcoding markers (ITS, ITS1, ITS2 and LSU) and propose marker‐specific similarity cutoffs for different yeast taxonomic groups. These results provide actionable guidance for marker selection and improve the interpretation of metabarcoding data. We further demonstrate the impact of well‐curated reference databases with up‐to‐date taxonomy by reanalyzing Human Microbiome Project data, revealing how diet and environment shape the gut mycobiota.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12649295/full.md

---
Source: https://tomesphere.com/paper/PMC12649295