A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis

Teo Susnjak

arXiv:2605.06937·cs.LG·May 11, 2026

A Reproducible Optimisation Protocol for Calibrating Prompt-Based Large Language Model Workflows in Evidence Synthesis

Teo Susnjak

PDF

TL;DR

This paper introduces a reproducible calibration workflow for prompt-based large language models in evidence synthesis, emphasizing transparency, transferability, and systematic optimization.

Contribution

It presents a structured, metric-guided prompt calibration protocol that separates task rules from prompt framing, with validation on screening tasks using DSPy and GEPA tools.

Findings

01

Calibration workflow improves prompt performance on screening tasks

02

Using a smaller student LLM with a larger reflection LLM enhances optimization

03

Artefact preservation facilitates reproducibility and transferability

Abstract

This methods article presents a reproducible calibration workflow for prompt-based large language models (LLMs) in structured evidence-synthesis tasks. The method separates the rules that define the scientific task from the mutable prompt harness that frames and applies them. It optimises that harness against labelled or reference examples and an explicit task metric, then preserves the calibrated workflow as an inspectable artefact with its specification, metric, settings, and evaluation traces. The example code instantiates the protocol with DSPy and GEPA tools, but the underlying logic can transfer to other prompt-optimisation frameworks that support structured task definitions, metric-guided search, and artefact reuse. Title and abstract screening is the worked validation case because it provides labelled benchmark data and clear evaluation metrics. The demonstrated workflow uses a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.