Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Generation for Few-shot Learning
Chengzhengxu Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, Chao Shen

TL;DR
This paper introduces a novel reinforcement learning approach, DP2O, for discrete prompt optimization in few-shot NLP tasks, leveraging dialogue strategies and efficient metrics to outperform existing methods with less computational cost.
Contribution
The paper proposes a new RL-based discrete prompt optimization method, DP2O, using dialogue alignment and a prompt screening metric, improving efficiency and performance over prior approaches.
Findings
DP2O outperforms SOTA by 1.52% in accuracy on four datasets.
DP2O requires only 0.67% of PLM parameters for training.
DP2O demonstrates strong universality, robustness, and generalization.
Abstract
Prompt-based pre-trained language models (PLMs) paradigm have succeeded substantially in few-shot natural language processing (NLP) tasks. However, prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts, which is costly, inefficient, and subjective. Meanwhile, existing continuous prompt optimization methods improve the performance by learning the ideal prompts through the gradient information of PLMs, whose high computational cost, and low readability and generalizability are often concerning. To address the research gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt Optimization () method. We first design a multi-round dialogue alignment strategy for readability prompt set generation based on GPT-4. Furthermore, we propose an efficient prompt screening metric to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Residual Connection
