Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows
Orlando Marquez Ayala, Patrice Bechard, Emily Chen, Maggie Baird, Jingfei Chen

TL;DR
This paper compares fine-tuning small language models and prompting large language models for generating low-code workflows, finding fine-tuning still offers quality advantages despite the popularity of prompting LLMs.
Contribution
It provides empirical evidence that fine-tuning SLMs can outperform prompting LLMs in domain-specific structured output tasks, specifically low-code workflow generation.
Findings
Fine-tuning SLMs improves quality by 10% over prompting LLMs.
Prompting yields reasonable but less accurate results.
Error analysis reveals specific model limitations.
Abstract
Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Scientific Computing and Data Management
