# Resolution-Adaptive Binning Enhances Machine Learning Modeling by Interbatch and Multiplatform Orbitrap-Based Shotgun Mass Spectrometry Data Integration

**Authors:** Hiu-Lok Ngan, Jialing Zhang, Kenneth Kin-Leung Kwan, Jacinth Wing-Sum Cheu, Li Zhong, Yike Guo, Xian Yang, Carmen Chak-Lui Wong, Hong Yan, Zongwei Cai

PMC · DOI: 10.1021/acs.analchem.5c05874 · 2025-11-25

## TL;DR

A new binning method improves machine learning models using mass spectrometry data from different batches and platforms, enhancing disease detection accuracy.

## Contribution

A resolution-adaptive binning strategy is introduced for integrating Orbitrap-based shotgun MS data across batches and platforms.

## Key findings

- The method recovers 88–99% of ground truth features in low mass regions from mixed standard solutions.
- It achieves stable binning across low, mid, and high mass regions, leading to better predictive models.
- A mouse hepatocellular carcinoma model identified 10 generic metabolites useful for disease detection across various sample methods.

## Abstract

Machine learning (ML) modeling on mass spectrometry (MS)-based
shotgun data facilitates feature selection and disease modeling. However,
batch-specific models often struggle with limited transferability
and generalizability, necessitating data integration from multiple
batches and platforms. Traditional binning methods can either disintegrate
or aggregate m/z features, making
data combination unreliable. In this study, we introduce a mass resolution-adaptive
binning and integration strategy to overcome these challenges. This
approach recovers 88–99% of ground truth features in a low
mass region (70–434 m/z)
from 49 mixed standard solutions at 250, 500, and 1000 ppb. Compared
to conventional methods, it demonstrates stable binning and integration
across low (100–450 m/z),
mid (450–900 m/z), and high
(900–1500 m/z) mass regions,
resulting in superior predictive models. Using a mouse model of hepatocellular
carcinoma as a proof-of-concept study, we identify 10 generic metabolites
that showcase advancements in using ambient MS imaging (MSI) data
for modeling and deploy the attained model to shotgun data. This facilitates
disease detection via various sample introduction methods, including
MSI on liver cryosections (F1 score = 0.87) and glass
smears (F1 score = 0.80), as well as rapid direct
infusion analysis (recall = 0.89 and precision = 0.63). This novel
mass resolution-adaptive binning and integration strategy offers a
promising approach for integrating different data sets, potentially
improving disease detection accuracy in MS applications.

## Linked entities

- **Diseases:** hepatocellular carcinoma (MONDO:0007256)
- **Species:** Mus musculus (taxon 10090)

## Full-text entities

- **Diseases:** hepatocellular carcinoma (MESH:D006528)
- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12874219/full.md

---
Source: https://tomesphere.com/paper/PMC12874219