To Write or to Automate Linguistic Prompts, That Is the Question
Marina S\'anchez-Torr\'on, Daria Akselrod, Jason Rauchwerk

TL;DR
This paper systematically compares manual and automated prompt optimization methods for linguistic tasks using LLMs, revealing task-dependent results and highlighting the strengths of each approach.
Contribution
It provides the first comprehensive evaluation of expert prompts versus GEPA-optimized signatures across multiple linguistic tasks and model configurations.
Findings
Optimized and manual prompts perform similarly in terminology insertion.
Different approaches excel in translation depending on the model.
Expert prompts outperform in error detection for language quality assessment.
Abstract
LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations. Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality. In translation, each approach wins on different models. In LQA, expert prompts achieve stronger error detection while optimization improves characterization. Across all tasks, GEPA elevates minimal DSPy signatures, and the majority of expert-optimized comparisons show no statistically significant difference. We note that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
