Frag’n’Flow: automated workflow for large-scale quantitative proteomics in high performance computing environments
Istvan Szepesi-Nagy, Roberta Borosta, Zoltan Szabo, Gabor E. Tusnady, Lorinc S. Pongor, Gergely Rona

TL;DR
Frag’n’Flow is an automated pipeline that makes large-scale proteomics data analysis faster and easier using high-performance computing.
Contribution
Frag’n’Flow combines FragPipe and Nextflow to automate and optimize large-scale proteomics analysis on HPC environments.
Findings
Frag’n’Flow reduces runtime by nearly half on a typical DIA dataset while maintaining quantitative accuracy.
The pipeline successfully validated results across label-free DDA, DIA, and TMT datasets with minimal user input.
It alleviates memory and I/O bottlenecks, enabling efficient analysis of large MS datasets.
Abstract
Analysing large-scale mass spectrometry-based complex proteomics datasets often overwhelm desktop computational resources and require manual configuration for analysis. While FragPipe delivers rapid peptide identification across diverse sample preparation and acquisition modes (DDA, DIA, TMT), it remains challenging to deploy at scale. We introduce Frag’n’Flow, a Nextflow‐based pipeline that encapsulates FragPipe, automates input manifest and workflow generation, manages tool dependencies and includes downstream data analysis options to enable reproducible, high‐performance analyses on HPC, cloud, and cluster environments. Benchmarking against other workflow-based solutions shows that our pipeline maintains quantitative accuracy and cuts runtime nearly in half on a typical DIA dataset of ~ 58 GB, while alleviating memory and I/O bottlenecks. We validate Frag’n’Flow results across three…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Proteomics Techniques and Applications · Mass Spectrometry Techniques and Applications · vaccines and immunoinformatics approaches
