# Craft: a machine learning approach to dengue subtyping

**Authors:** Daniel J van Zyl, Marcel Dunaiski, Houriiyah Tegally, Cheryl Baxter, Tulio de Oliveira, Joicymara S Xavier, Christina Riley, Christina Riley, Anna Winters, Vivek Naranbhai, Felix Made, Salim Abdool Karim, Kennedy Otwombe, Alash'le Abimiku, Sophia Osawe, James Onyemata, Patrick Dakum, Fati Murtala-Ibrahim, Nifarta Andrew, Aminu Musa, Tolulope Adenekan, Kenneth Ewerem, Victoria Etuk, Tulio de Oliveira, Cheryl Baxter, Eduan Wilkinson, Houriiyah Tegally, Jenicca Poongavanan, Michelle Parker, Danilo Silva, Joicymara S Xavier, Kristen A Stafford, Manhattan Charurat, Natalia Blanco, Timothy O'Connor, Meagan Fitzpatrick, Mohammad M Sajadi, Olanrewaju Lawal, Chenfeng Xiong, Weiyu Luo, Xin Wu

PMC · DOI: 10.1093/bioadv/vbaf224 · Bioinformatics Advances · 2025-10-06

## TL;DR

Craft is a fast and accurate machine learning tool for classifying dengue virus subtypes, outperforming existing methods in speed and accuracy.

## Contribution

Introduces Craft, a machine learning framework for rapid and accurate dengue subtyping with high performance on short sequence segments.

## Key findings

- Craft achieves 99.5% accuracy on a test set while processing over 140,000 sequences per minute.
- Maintains high accuracy even with sequence segments as short as 700 nucleotides.
- Outperforms existing tools like Genome Detective, GLUE, and Nextclade in classification speed.

## Abstract

The dengue virus poses a major global health threat, with nearly 390 million infections annually. A recently proposed hierarchical dengue nomenclature system enhances spatial resolution by defining major and minor lineages within genotypes, aiding efforts to track viral evolution. While current subtyping tools—Genome Detective, GLUE, and Nextclade—rely on computationally intensive sequence alignment and phylogenetic inference, machine learning presents a promising alternative for achieving accurate and rapid classification.

We present Craft (Chaos Random Forest), a machine learning framework for dengue subtyping. We demonstrate that Craft is capable of faster classification speeds while matching or surpassing the accuracy of existing tools. Craft achieves 99.5% accuracy on a hold-out test set formed from a consensus of predictions from existing tools and processes over 140 000 sequences per minute. Notably, Craft maintains remarkably high accuracy even when classifying sequence segments as short as 700 nucleotides.

Source code is available at: https://github.com/INFORM-Africa/AI-viral-lineage-classification.

## Linked entities

- **Diseases:** dengue (MONDO:0005502)

## Full-text entities

- **Diseases:** dengue (MESH:D003715), infections (MESH:D007239)
- **Species:** Dengue virus (no rank) [taxon 12637]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12527244/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12527244/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12527244/full.md

---
Source: https://tomesphere.com/paper/PMC12527244