QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Yilun Kong; Hangyu Mao; Qi Zhao; Bin Zhang; Jingqing Ruan; Li Shen; Yongzhe Chang; Xueqian Wang; Rui Zhao; Dacheng Tao

arXiv:2408.10504·cs.AI·June 2, 2025

QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

Yilun Kong, Hangyu Mao, Qi Zhao, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

PDF

Open Access

TL;DR

QPO introduces a multi-loop offline reinforcement learning approach to optimize query-dependent prompts for large language models, reducing interaction costs and improving task-specific performance across diverse NLP and math tasks.

Contribution

The paper proposes a novel offline RL-based method for query-dependent prompt optimization that leverages existing demonstration data and iterative fine-tuning, reducing online interaction costs.

Findings

01

Effective across various LLM scales and tasks

02

Significantly reduces interaction costs

03

Improves zero-shot and few-shot performance

Abstract

Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLMs to obtain feedback for guiding the optimization process, incurring substantial redundant interaction costs. In this paper, we introduce Query-dependent Prompt Optimization (QPO), which leverages multi-loop offline reinforcement learning to iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries, thus significantly improving the prompting effect on the large target LLM. We derive insights from offline prompting demonstration data, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optical Network Technologies · Software-Defined Networks and 5G · Network Traffic and Congestion Control

MethodsFocus