REAL: Response Embedding-based Alignment for LLMs
Honggen Zhang, Xufeng Zhao, Igor Molybog, June Zhang

TL;DR
This paper introduces REAL, a response embedding-based method that improves LLM alignment by selecting less ambiguous response pairs, reducing annotation bias and effort, and enhancing alignment quality.
Contribution
REAL proposes a novel embedding-based selection strategy for response pairs that improves annotation efficiency and alignment accuracy in LLMs.
Findings
Selecting dissimilar response pairs improves LLM alignment.
The method reduces labeling errors and annotation effort.
Enhanced performance on dialogue tasks with less annotation work.
Abstract
Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization (DPO) rely on pairs of AI-generated responses ranked according to human annotation. The response pair annotation process might bring human bias. Building a correct preference dataset is the costly part of the alignment pipeline. To improve annotation efficiency and quality in the LLMs alignment, we propose REAL: Response Embedding-based Alignment for LLMs, a strategy for constructing a high-quality training dataset that focuses on acquiring the less ambiguous preference pairs for labeling out of a set of response candidates. Our selection process is based on the similarity of embedding responses independently of prompts, which guarantees the selection process in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training
