Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu,, Eric P. Xing

TL;DR
This paper introduces DRPO, a tuning-free, self-alignment method for LLMs that uses prompt optimization and dynamic rewarding to improve alignment without additional training or human annotations.
Contribution
The paper presents a novel inference-time optimization framework enabling LLMs to self-align through prompt optimization and dynamic rewarding, eliminating the need for costly tuning or annotations.
Findings
DRPO significantly improves alignment performance across eight LLMs.
Optimized prompts outperform human-curated prompts in alignment tasks.
Base models with DRPO outperform traditional fine-tuned or RLHF-tuned models.
Abstract
Aligning Large Language Models (LLMs) traditionally relies on costly training and human preference annotations. Self-alignment seeks to reduce these expenses by enabling models to align themselves. To further lower costs and achieve alignment without any expensive tuning or annotations, we introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (DRPO). Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions, all without additional training or human intervention. The core of DRPO is a dynamic rewarding mechanism, which identifies and rectifies model-specific alignment weaknesses, allowing LLMs to adapt efficiently to diverse alignment challenges. Empirical evaluations on eight recent LLMs, both open- and closed-sourced, demonstrate that DRPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsALIGN · Balanced Selection
