# ReShuffle-MS: Region-Guided Data Augmentation Improves Artificial Intelligence-Based Resistance Prediction in Escherichia coli from MALDI-TOF Mass Spectrometry

**Authors:** Dongbo Dai, Chenyang Huang, Junjie Li, Xiao Wei, Shengzhou Li, Qiong Wu, Huiran Zhang

PMC · DOI: 10.3390/microorganisms14010177 · 2026-01-13

## TL;DR

A new data augmentation method called ReShuffle-MS improves AI predictions of antibiotic resistance in E. coli using MALDI-TOF mass spectrometry data.

## Contribution

Introduces ReShuffle-MS, a region-guided data augmentation framework that enhances AI performance for AMR prediction from MALDI-TOF spectra.

## Key findings

- ReShuffle-MS improved classical machine learning accuracy by 3.7% on E. coli levofloxacin resistance prediction.
- A one-dimensional CNN achieved 83.25% accuracy and 97.28% recall using ReShuffle-MS.
- The method generalized to a different antibiotic (ceftriaxone) and laboratory setting.

## Abstract

Rapid antimicrobial resistance (AMR) prediction from MALDI-TOF mass spectrometry (MS) remains challenging, particularly when training artificial intelligence (AI) models under small-sample constraints. Performance is often hampered by the high dimensionality of spectral data and the subtle nature of resistance-related signals: full-spectrum approaches risk overfitting to high-dimensional noise, whereas peak-selection strategies risk discarding structurally informative, low-intensity signals. Here, we propose ReShuffle-MS, a region-guided data augmentation framework for MS data. Each spectrum is partitioned into a Main Discriminative Region (MDR) and a Peripheral Peak Region (PPR). By recombining signals within the PPR across samples of the same class while keeping the MDR intact, ReShuffle-MS generates structure-preserving augmented samples. On a clinical dataset for Escherichia coli (E. coli) levofloxacin resistance prediction, ReShuffle-MS delivered significant and consistent performance gains. It improved the average accuracy of classical machine learning models by 3.7% and enabled a one-dimensional convolutional neural network (CNN) to achieve 83.25% accuracy and 97.28% recall. Visualization using Grad-CAM revealed a shift from sparse, peak-dependent attention toward broader and more meaningful spectral patterns. Validation on the external DRIAMS-C dataset for ceftriaxone resistance further demonstrated that the method generalizes to a distinct laboratory setting and a different antibiotic target. These findings suggest that ReShuffle-MS can enhance the robustness and clinical utility of AI-based AMR prediction from routinely acquired MALDI-TOF spectra.

## Linked entities

- **Chemicals:** levofloxacin (PubChem CID 149096), ceftriaxone (PubChem CID 5479530)
- **Species:** Escherichia coli (taxon 562)

## Full-text entities

- **Chemicals:** levofloxacin (MESH:D064704), ceftriaxone (MESH:D002443)
- **Species:** Escherichia coli (E. coli, species) [taxon 562]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12844117/full.md

---
Source: https://tomesphere.com/paper/PMC12844117