Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT
Mario S\"anger, Ninon De Mecquenem, Katarzyna Ewa Lewi\'nska, Vasilis, Bountris, Fabian Lehmann, Ulf Leser, Thomas Kosch

TL;DR
This paper explores how ChatGPT, a large language model, can assist users in understanding and modifying complex scientific workflows, highlighting its strengths and limitations through user studies.
Contribution
It demonstrates the potential of LLMs like ChatGPT to support scientific workflow development and identifies areas needing improvement.
Findings
LLMs effectively interpret workflows in scientific domains.
Performance drops when exchanging components or extending workflows.
Limitations of LLMs in complex workflow modifications are characterized.
Abstract
Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. To address these challenges, we investigate the efficiency of Large Language Models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed three user studies in two scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Data Storage Technologies · Topic Modeling
