Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

Xiaohua Wang; Zisu Huang; Feiran Zhang; Zhibo Xu; Cenyuan Zhang; Qi Qian; Xiaoqing Zheng; Xuanjing Huang

arXiv:2407.01461·cs.CL·July 1, 2025

Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

Xiaohua Wang, Zisu Huang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Qi Qian, Xiaoqing Zheng, Xuanjing Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reinforcement learning-based prompt refinement framework that improves large language models' response quality and robustness against harmful jailbreak prompts by enhancing query quality before input.

Contribution

It presents a transferable, pluggable prompt refinement model trained with a novel reinforcement learning approach to boost LLMs' honesty, harmlessness, and resistance to adversarial prompts.

Findings

01

Enhanced response quality in LLMs

02

Improved robustness against jailbreak attacks

03

Effective reinforcement learning training method

Abstract

The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially toxic content. To enhance the capabilities of LLMs while maintaining strong robustness against harmful jailbreak inputs, this study proposes a transferable and pluggable framework that refines user prompts before they are input into LLMs. This strategy improves the quality of the queries, empowering LLMs to generate more truthful, benign and useful responses. Specifically, a lightweight query refinement model is introduced and trained using a specially designed reinforcement learning approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huangzisu/query-refinement
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies