From Research Question to Scientific Workflow: Leveraging Agentic AI for Science Automation
Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas, Michal Kuszewski

TL;DR
This paper introduces an agentic AI architecture that automates the semantic translation of research questions into scientific workflows, improving accuracy and efficiency in workflow generation.
Contribution
The proposed multi-layered architecture combines language models, validation, and domain-specific skills to automate and improve scientific workflow creation.
Findings
Skills increase intent accuracy from 44% to 83%.
Skill-driven workflow generation reduces data transfer by 92%.
Pipeline completes queries with low latency and cost.
Abstract
Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research questions into workflow specifications, a task requiring both domain knowledge and infrastructure expertise. We propose an agentic architecture that closes this gap through three layers: an LLM interprets natural language into structured intents (semantic layer); validated generators produce reproducible workflow DAGs (deterministic layer); and domain experts author ``Skills'': markdown documents encoding vocabulary mappings, parameter constraints, and optimization strategies (knowledge layer). This decomposition confines LLM non-determinism to intent extraction: identical intents always yield identical workflows. We implement and evaluate the architecture on the 1000 Genomes population…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
