# Frag’n’Flow: automated workflow for large-scale quantitative proteomics in high performance computing environments

**Authors:** Istvan Szepesi-Nagy, Roberta Borosta, Zoltan Szabo, Gabor E. Tusnady, Lorinc S. Pongor, Gergely Rona

PMC · DOI: 10.1186/s12859-025-06305-y · 2026-01-04

## TL;DR

Frag’n’Flow is an automated pipeline that makes large-scale proteomics data analysis faster and easier using high-performance computing.

## Contribution

Frag’n’Flow combines FragPipe and Nextflow to automate and optimize large-scale proteomics analysis on HPC environments.

## Key findings

- Frag’n’Flow reduces runtime by nearly half on a typical DIA dataset while maintaining quantitative accuracy.
- The pipeline successfully validated results across label-free DDA, DIA, and TMT datasets with minimal user input.
- It alleviates memory and I/O bottlenecks, enabling efficient analysis of large MS datasets.

## Abstract

Analysing large-scale mass spectrometry-based complex proteomics datasets often overwhelm desktop computational resources and require manual configuration for analysis. While FragPipe delivers rapid peptide identification across diverse sample preparation and acquisition modes (DDA, DIA, TMT), it remains challenging to deploy at scale.

We introduce Frag’n’Flow, a Nextflow‐based pipeline that encapsulates FragPipe, automates input manifest and workflow generation, manages tool dependencies and includes downstream data analysis options to enable reproducible, high‐performance analyses on HPC, cloud, and cluster environments. Benchmarking against other workflow-based solutions shows that our pipeline maintains quantitative accuracy and cuts runtime nearly in half on a typical DIA dataset of ~ 58 GB, while alleviating memory and I/O bottlenecks. We validate Frag’n’Flow results across three representative datasets, label-free DDA, DIA, and TMT, successfully recapitulating published biological signatures with minimal user intervention.

By combining the sensitivity and speed of FragPipe with Nextflow’s orchestration, Frag’n’Flow enables the analysis of large‐scale proteomics data, empowering the scientific community, without extensive computation expertise, to extract new insights from existing MS datasets. Frag’n’Flow is available at: https://github.com/ronalabrcns/FragNFlow.

The online version contains supplementary material available at 10.1186/s12859-025-06305-y.

## Full-text entities

- **Genes:** COX6C (cytochrome c oxidase subunit 6C) [NCBI Gene 1345], ESR1 (estrogen receptor 1) [NCBI Gene 2099] {aka ER, ESR, ESRA, ESTRR, Era, NR3A1}, IFNG (interferon gamma) [NCBI Gene 3458] {aka IFG, IFI, IMD69}, KIF5C (kinesin family member 5C) [NCBI Gene 3800] {aka CDCBM2, KINN, NKHC, NKHC-2, NKHC2}, ICAM1 (intercellular adhesion molecule 1) [NCBI Gene 3383] {aka BB2, CD54, P3.58}, CNTNAP2 (contactin associated protein 2) [NCBI Gene 26047] {aka AUTS15, CASPR2, CDFE, NRXN4, PTHSL1}, EREG (epiregulin) [NCBI Gene 2069] {aka EPR, ER, Ep}, CNTN1 (contactin 1) [NCBI Gene 1272] {aka CMYO12, CMYP12, F3, GP135, MYPCN}, HIF1A (hypoxia inducible factor 1 subunit alpha) [NCBI Gene 3091] {aka HIF-1-alpha, HIF-1A, HIF-1alpha, HIF1, HIF1-ALPHA, MOP1}, GPRIN3 (GPRIN family member 3) [NCBI Gene 285513] {aka GRIN3}, NDUFV2 (NADH:ubiquinone oxidoreductase core subunit V2) [NCBI Gene 4729] {aka CI-24k, MC1DN7}
- **Diseases:** inflammation (MESH:D007249), Breast Carcinoma (MESH:D001943), Clear Cell Renal Cell Carcinoma (MESH:D002292), hypoxia (MESH:D000860), FXS (MESH:D005600), MSBB (MESH:C537181), HDD (MESH:D018804), metastasis (MESH:D009362), AD (MESH:D000544), RAM (MESH:D008569), ALS (MESH:D008113), Tumor (MESH:D009369), HPC (MESH:C000719218), SSD (MESH:D055959)
- **Chemicals:** I (MESH:D007455), O (MESH:D010100), CSV (-), cysteine (MESH:D003545), MMTS (MESH:C014674), iodoacetamide (MESH:D007460)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12828970/full.md

---
Source: https://tomesphere.com/paper/PMC12828970