From Prompt to Pipeline: Large Language Models for Scientific Workflow Development in Bioinformatics
Khairul Alam, Banani Roy

TL;DR
This paper investigates the use of large language models to assist in creating bioinformatics workflows, demonstrating their potential to improve accuracy, completeness, and usability across different platforms and tasks.
Contribution
It introduces a tiered prompting approach to enhance LLM-generated workflows and evaluates multiple models across bioinformatics tasks and platforms, highlighting their practical utility.
Findings
Gemini 2.5 Flash excelled in Galaxy workflows
DeepSeek-V3 outperformed in Nextflow pipeline generation
Prompting strategies significantly improved workflow correctness
Abstract
Scientific Workflow Systems such as Galaxy and Nextflow are essential for scalable, reproducible, and automated bioinformatics analyses. However, developing and understanding scientific workflows remains challenging for many domain scientists due to the complexity of tool/module selection, infrastructure requirements, and limited programming expertise. This study explores whether state-of-the-art Large Language Models such as GPT-4o, Gemini 2.5 Flash, and DeepSeek-V3 can assist in generating accurate, complete, and usable bioinformatics workflows. We evaluate a set of representative workflows covering tasks such as RNA-seq, SNP analysis, and DNA methylation across both Galaxy (graphical) and Nextflow (script-based) platforms. To simulate realistic usage, we adopt a tiered prompting strategy: each workflow is first generated using an instruction-only prompt; if the output is incomplete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Research Data Management Practices
