ReShuffle-MS: Region-Guided Data Augmentation Improves Artificial Intelligence-Based Resistance Prediction in Escherichia coli from MALDI-TOF Mass Spectrometry
Dongbo Dai, Chenyang Huang, Junjie Li, Xiao Wei, Shengzhou Li, Qiong Wu, Huiran Zhang

TL;DR
A new data augmentation method called ReShuffle-MS improves AI predictions of antibiotic resistance in E. coli using MALDI-TOF mass spectrometry data.
Contribution
Introduces ReShuffle-MS, a region-guided data augmentation framework that enhances AI performance for AMR prediction from MALDI-TOF spectra.
Findings
ReShuffle-MS improved classical machine learning accuracy by 3.7% on E. coli levofloxacin resistance prediction.
A one-dimensional CNN achieved 83.25% accuracy and 97.28% recall using ReShuffle-MS.
The method generalized to a different antibiotic (ceftriaxone) and laboratory setting.
Abstract
Rapid antimicrobial resistance (AMR) prediction from MALDI-TOF mass spectrometry (MS) remains challenging, particularly when training artificial intelligence (AI) models under small-sample constraints. Performance is often hampered by the high dimensionality of spectral data and the subtle nature of resistance-related signals: full-spectrum approaches risk overfitting to high-dimensional noise, whereas peak-selection strategies risk discarding structurally informative, low-intensity signals. Here, we propose ReShuffle-MS, a region-guided data augmentation framework for MS data. Each spectrum is partitioned into a Main Discriminative Region (MDR) and a Peripheral Peak Region (PPR). By recombining signals within the PPR across samples of the same class while keeping the MDR intact, ReShuffle-MS generates structure-preserving augmented samples. On a clinical dataset for Escherichia coli…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacterial Identification and Susceptibility Testing · Mass Spectrometry Techniques and Applications · Machine Learning in Materials Science
