# Machine learning-based typing of Clostridium botulinum group III by FT-IR spectroscopy

**Authors:** Ilenia Drigo, Angela Guolo, Alessia Rizzardi, Miriam Cordovana, Manuel Garbuio, Elena Tonon, Marco Vedana, Luca Zandonà, Luca Bano

PMC · DOI: 10.1128/spectrum.01562-25 · Microbiology Spectrum · 2026-01-15

## TL;DR

This study shows that FT-IR spectroscopy combined with machine learning can quickly and accurately classify Clostridium botulinum strains, offering a promising new tool for tracking botulism outbreaks.

## Contribution

The novel use of machine learning with FT-IR spectroscopy for typing Clostridium botulinum strains is introduced as a rapid and cost-effective method.

## Key findings

- FT-IR spectroscopy with machine learning achieved 97% accuracy in classifying C. botulinum strains by toxin type.
- Strains of types A, B, and F were clearly separated from types C, CD, DC, and D using spectral analysis.
- The IR Biotyper system proved to be a user-friendly and cost-effective tool for C. botulinum typing.

## Abstract

This study aimed to investigate the utility of Fourier-Transform Infrared Spectroscopy (FT-IRS) for differentiating Clostridium botulinum (C. botulinum) based on its botulinum neurotoxin (BoNT)-encoding gene type and its potential as an epidemiological tool for investigating botulism outbreaks. A total of 110 botulinum neurotoxin-producing clostridia (BNPC) strains, including reference, animal isolates, and human outbreak strains, were analyzed in four replicates using the IR Biotyper system (IRBT). Samples preparation was carried out according to the manufacturer’s instructions. Similarity analysis was performed by hierarchical cluster analysis (HCA), principal component analysis (PCA), and linear discriminant analysis (LDA). The artificial intelligence capabilities of the IRBT software were applied to develop a classifier for C. botulinum differentiation at toxin-serotype or subtype level. HCA, PCA, and LDA showed a good clustering of strains belonging to the same type. In accordance with the lineages evidenced in whole-genome sequencing (WGS) studies, types A, B, and F BNPC appeared clearly separated from types C, CD, DC, and D. Considering only C, CD, DC, and D types, the highest discriminatory power is achieved in the wavenumber range 1,800–1,500 cm−1. Four different clusters were detected. Support vector machine algorithm with linear kernel (Linear SVM) showed the highest accuracy of discrimination at the BoNT type level with an accuracy of 97%. Although these preliminary results need to be confirmed with a higher number of strains, the IRBT system proved to be a very promising, user-friendly, and cost-effective tool for C. botulinum typing, and the application of machine learning algorithms represents a novel approach for BNPC typing.

Botulism outbreaks represent a significant threat to public and animal health. Rapid and accurate typing methods are essential for effective epidemiological investigations, source tracing, and the implementation of appropriate control measures. Current methods for botulinum neurotoxin serotyping are often time-consuming, expensive, and require specialized expertise. Our research demonstrated that FT-IRS, a rapid, user-friendly, and cost-effective technique already well established in microbiology for broader bacterial characterization, can be successfully adapted for this crucial task. The use of a commercially available system like the IRBT significantly enhances the potential for widespread adoption of this methodology in routine diagnostics and surveillance.

## Linked entities

- **Proteins:** bont (botulinum neurotoxin subtype A1)
- **Diseases:** botulism (MONDO:0005498)
- **Species:** Clostridium botulinum (taxon 1491)

## Full-text entities

- **Genes:** BoNT [NCBI Gene 46646802]
- **Diseases:** botulism (MESH:D001906), CD (MESH:D003424)
- **Species:** Homo sapiens (human, species) [taxon 9606], Clostridia (class) [taxon 186801], Clostridium botulinum (species) [taxon 1491]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12955471/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12955471/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12955471/full.md

---
Source: https://tomesphere.com/paper/PMC12955471