Reflective Prompt Tuning through Language Model Function-Calling
Farima Fatahi Bayat, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka

TL;DR
Reflective Prompt Tuning (RPT) leverages language model function calling to iteratively diagnose and revise prompts, significantly improving reasoning performance and calibration across multiple tasks.
Contribution
This paper introduces RPT, a novel framework that automates prompt optimization by simulating human-like iterative diagnosis and revision using LLM function calling.
Findings
RPT improves task performance by up to 12.9 points.
RPT is especially effective on multi-hop and mathematical reasoning.
RPT enhances confidence calibration in LLMs.
Abstract
Large language models (LLMs) have become increasingly capable of following instructions and complex reasoning, making prompting a flexible interface for adapting models without parameter updates. Yet prompt design remains labor-intensive and highly sensitive to formatting, phrasing, and instruction order, motivating automated prompt optimization methods that reduce manual effort while preserving inference-time flexibility. However, existing methods often search over prompt candidates or use fixed critique-refine pipelines driven by individual examples or small batches, limiting their ability to capture systematic error patterns and make targeted edits grounded in failure history. We propose Reflective Prompt Tuning (RPT), a framework that uses LLM function calling to simulate the iterative workflow of human prompt engineers. An LLM optimizer calls a diagnostic function that evaluates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
