StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for   Large Language Models

Minchan Kwon; Gaeun Kim; Jongsuk Kim; Haeil Lee; Junmo Kim

arXiv:2410.07652·cs.CL·October 11, 2024

StablePrompt: Automatic Prompt Tuning using Reinforcement Learning for Large Language Models

Minchan Kwon, Gaeun Kim, Jongsuk Kim, Haeil Lee, Junmo Kim

PDF

Open Access 1 Repo

TL;DR

StablePrompt introduces an RL-based prompt tuning method with adaptive updates to improve stability and performance across multiple NLP tasks in large language models.

Contribution

It proposes StablePrompt, a novel RL framework with APPO and an LLM anchor to enhance prompt tuning stability and effectiveness.

Findings

01

Outperforms previous prompt tuning methods on multiple NLP tasks

02

Achieves higher stability in reinforcement learning-based prompt search

03

Maintains linguistic capabilities of pre-trained LLMs

Abstract

Finding appropriate prompts for the specific task has become an important issue as the usage of Large Language Models (LLM) has expanded. Reinforcement Learning (RL) is widely used for prompt tuning, but its inherent instability and environmental dependency make it difficult to use in practice. In this paper, we propose StablePrompt, which strikes a balance between training stability and search space, mitigating the instability of RL and producing high-performance prompts. We formulate prompt tuning as an online RL problem between the agent and target LLM and introduce Adaptive Proximal Policy Optimization (APPO). APPO introduces an LLM anchor model to adaptively adjust the rate of policy updates. This allows for flexible prompt search while preserving the linguistic ability of the pre-trained LLM. StablePrompt outperforms previous methods on various tasks including text classification,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kmc0207/Stableprompt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAsynchronous Proximal Policy Optimization