Do Large Language Models Speak Scientific Workflows?
Orcun Yildiz, and Tom Peterka

TL;DR
This paper investigates the capabilities and limitations of large language models in handling scientific workflows through various experiments, revealing their struggles and variability in performance across tasks and systems.
Contribution
It provides an empirical evaluation of LLMs on scientific workflow tasks, highlighting their current limitations and variability, and offers insights for future research and application.
Findings
LLMs often struggle with scientific workflow tasks due to limited domain knowledge.
Performance of LLMs varies significantly across different workflow tasks and systems.
The study offers guidance for workflow developers on LLM capabilities and limitations.
Abstract
With the advent of large language models (LLMs), there is a growing interest in applying LLMs to scientific tasks. In this work, we conduct an experimental study to explore applicability of LLMs for configuring, annotating, translating, explaining, and generating scientific workflows. We use 5 different workflow specific experiments and evaluate several open- and closed-source language models using state-of-the-art workflow systems. Our studies reveal that LLMs often struggle with workflow related tasks due to their lack of knowledge of scientific workflows. We further observe that the performance of LLMs varies across experiments and workflow systems. Our findings can help workflow developers and users in understanding LLMs capabilities in scientific workflows, and motivate further research applying LLMs to workflows.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices
