The BioExcel methodology for developing dynamic, scalable, reliable and portable computational biomolecular workflows
Jorge Ejarque, Pau Andrio, Adam Hospital, Javier Conejero, Daniele, Lezzi, Josep LL. Gelpi, Rosa M. Badia

TL;DR
This paper introduces a methodology combining BioExcel Building Blocks and PyCOMPSs to simplify the development of scalable, portable, and reliable biomolecular workflows on high-performance computing systems.
Contribution
It presents a novel integrated approach that streamlines the creation and execution of complex biomolecular workflows across diverse distributed computing environments.
Findings
Validated portability across HPC, cloud, and container platforms.
Demonstrated scalability with large biomolecular datasets.
Confirmed reliability and flexibility of the workflows.
Abstract
Developing complex biomolecular workflows is not always straightforward. It requires tedious developments to enable the interoperability between the different biomolecular simulation and analysis tools. Moreover, the need to execute the pipelines on distributed systems increases the complexity of these developments. To address these issues, we propose a methodology to simplify the implementation of these workflows on HPC infrastructures. It combines a library, the BioExcel Building Blocks (BioBBs), that allows scientists to implement biomolecular pipelines as Python scripts, and the PyCOMPSs programming framework which allows to easily convert Python scripts into task-based parallel workflows executed in distributed computing systems such as HPC clusters, clouds, containerized platforms, etc. Using this methodology, we have implemented a set of computational molecular workflows and we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
