# UITOTO: a software for generating molecular diagnoses for species descriptions

**Authors:** Ambrosio Torres, Leshon Lee, Amrita Srivathsan, Rudolf Meier

PMC · DOI: 10.1111/cla.70023 · 2025-12-20

## TL;DR

UITOTO is a new software that helps generate accurate molecular diagnoses for species by finding and validating specific genetic markers.

## Contribution

UITOTO introduces a novel method for generating and validating molecular diagnoses using weighted random sampling and consensus strategies.

## Key findings

- UITOTO outperforms existing tools like MOLD in classification accuracy using F1 Score metrics on large datasets.
- The software identifies optimal diagnostic molecular combinations (DMCs) that balance specificity and length effectively.
- A user-friendly Shiny App-GUI is provided for visualization and generating publication-quality DMCs.

## Abstract

Millions of species remain undescribed, and each eventually will require a species description with a diagnosis. Yet, we lack software that can derive state‐specific and contrastive molecular diagnoses and allows the user to validate them based on all available sequences for the taxon under study. Here we introduce UITOTO, which addresses this shortcoming by facilitating the identification, testing, and visualization of diagnostic molecular combinations (DMCs). The software uses a weighted random sampling algorithm based on the Jaccard Index for building candidate DMCs. It then selects DMCs with the highest specificity stability, meeting user‐defined thresholds for exclusive character states. If multiple optimal DMCs are identified, UITOTO derives a majority‐consensus DMC. To verify whether the generated DMCs are contrastive, UITOTO includes a validation module that tests DMCs against databases, efficiently handling thousands of aligned or unaligned sequences. We here, not only propose UITOTO, but also assess its performance relative to other software that can derive DMCs (e.g. MOLD). For this purpose, we analyse three large empirical datasets: (i) Megaselia (Diptera: Phoridae: 69 species, 2229 training and 30 289 testing barcodes); (ii) Mycetophilidae (Diptera: 118 species, 1456 training, 60 349 testing barcodes); and (iii) European Lepidoptera (49 species, 591 training, 21 483 testing barcodes). Based on classification metrics (e.g. F1 Score), UITOTO's DMCs outcompete DMCs from other software. We furthermore provide guidelines for generating molecular diagnoses and a user‐friendly Shiny App‐GUI that includes a module for obtaining publication‐quality DMC visualizations. Overall, our study confirms that the biggest challenge for generating molecular and morphological diagnoses is similar: balancing specificity and length; short diagnoses often lack specificity, while excessively long DMCs are often so specific that they do not accommodate intraspecific variation.

## Linked entities

- **Species:** Megaselia (taxon 36165), Mycetophilidae (taxon 29035)

## Full-text entities

- **Species:** Diptera (flies, order) [taxon 7147]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12977935/full.md

---
Source: https://tomesphere.com/paper/PMC12977935