# Disc-Hub: a python package for benchmarking machine learning strategies in DIA-MS identification

**Authors:** Yiwen Yu, Xiaohui Wu, Jian Song

PMC · DOI: 10.1093/bioadv/vbaf232 · Bioinformatics Advances · 2025-09-30

## TL;DR

Disc-Hub is a Python package that helps evaluate and compare machine learning methods for analyzing DIA-MS data, improving peptide identification accuracy and reliability.

## Contribution

Disc-Hub introduces a benchmarking framework for comparing machine learning strategies in DIA-MS identification.

## Key findings

- K-fold training with a multilayer perceptron achieved the best balance between identification depth and FDR control.
- Disc-Hub enables rapid selection of optimal machine learning configurations for DIA identification algorithms.
- The package provides open access to datasets and code for reproducible benchmarking.

## Abstract

Accurate analysis of data-independent acquisition (DIA) mass spectrometry data relies on machine learning to distinguish target peptides from decoy peptides. Different DIA identification engines adopt distinct binary classifiers and training workflows to accomplish this learning task. However, systematic comparisons of how different machine learning strategies affect identification performance are lacking. This absence of evaluation hinders optimal learning strategy selection, increases the risk of model underfitting or overfitting, and ultimately undermines the effectiveness and reliability of false discovery rate (FDR) control.

In this study, we benchmarked three training strategies and four classifiers on representative DIA datasets. Among them, K-fold training combined with a multilayer perceptron achieved the best balance between identification depth and FDR control. We have released the datasets and code through the Python package Disc-Hub, enabling rapid selection of optimal machine learning configurations for developing DIA identification algorithms.

Disc-Hub is released as an open source software and can be installed from PyPi as a python module. The source code is available on GitHub at https://github.com/yuyiwen-yiyuwen/Disc_Hub.

## Full-text entities

- **Diseases:** DIA (MESH:D064129)
- **Chemicals:** Beta-DIA (-), methionine (MESH:D008715)
- **Species:** Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** HeLa — Homo sapiens (Human), Human papillomavirus-related endocervical adenocarcinoma, Cancer cell line (CVCL_0030)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12597894/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12597894/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12597894/full.md

---
Source: https://tomesphere.com/paper/PMC12597894