Dynamic Rewarding with Prompt Optimization Enables Tuning-free   Self-Alignment of Language Models

Somanshu Singla; Zhen Wang; Tianyang Liu; Abdullah Ashfaq; Zhiting Hu,; Eric P. Xing

arXiv:2411.08733·cs.CL·November 15, 2024

Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu,, Eric P. Xing

PDF

Open Access 1 Repo

TL;DR

This paper introduces DRPO, a tuning-free, self-alignment method for LLMs that uses prompt optimization and dynamic rewarding to improve alignment without additional training or human annotations.

Contribution

The paper presents a novel inference-time optimization framework enabling LLMs to self-align through prompt optimization and dynamic rewarding, eliminating the need for costly tuning or annotations.

Findings

01

DRPO significantly improves alignment performance across eight LLMs.

02

Optimized prompts outperform human-curated prompts in alignment tasks.

03

Base models with DRPO outperform traditional fine-tuned or RLHF-tuned models.

Abstract

Aligning Large Language Models (LLMs) traditionally relies on costly training and human preference annotations. Self-alignment seeks to reduce these expenses by enabling models to align themselves. To further lower costs and achieve alignment without any expensive tuning or annotations, we introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (DRPO). Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions, all without additional training or human intervention. The core of DRPO is a dynamic rewarding mechanism, which identifies and rectifies model-specific alignment weaknesses, allowing LLMs to adapt efficiently to diverse alignment challenges. Empirical evaluations on eight recent LLMs, both open- and closed-sourced, demonstrate that DRPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Singla17/DRPO
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsALIGN · Balanced Selection