Reflective Prompt Tuning through Language Model Function-Calling

Farima Fatahi Bayat; Moin Aminnaseri; Pouya Pezeshkpour; Estevam Hruschka

arXiv:2605.21781·cs.CL·May 22, 2026

Reflective Prompt Tuning through Language Model Function-Calling

Farima Fatahi Bayat, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka

PDF

TL;DR

Reflective Prompt Tuning (RPT) leverages language model function calling to iteratively diagnose and revise prompts, significantly improving reasoning performance and calibration across multiple tasks.

Contribution

This paper introduces RPT, a novel framework that automates prompt optimization by simulating human-like iterative diagnosis and revision using LLM function calling.

Findings

01

RPT improves task performance by up to 12.9 points.

02

RPT is especially effective on multi-hop and mathematical reasoning.

03

RPT enhances confidence calibration in LLMs.

Abstract

Large language models (LLMs) have become increasingly capable of following instructions and complex reasoning, making prompting a flexible interface for adapting models without parameter updates. Yet prompt design remains labor-intensive and highly sensitive to formatting, phrasing, and instruction order, motivating automated prompt optimization methods that reduce manual effort while preserving inference-time flexibility. However, existing methods often search over prompt candidates or use fixed critique-refine pipelines driven by individual examples or small batches, limiting their ability to capture systematic error patterns and make targeted edits grounded in failure history. We propose Reflective Prompt Tuning (RPT), a framework that uses LLM function calling to simulate the iterative workflow of human prompt engineers. An LLM optimizer calls a diagnostic function that evaluates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.