Improving Context-Aware Preference Modeling for Language Models
Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni

TL;DR
This paper introduces a two-step approach to improve preference modeling in language models by incorporating context, demonstrating that context-aware models outperform existing ones and align better with human preferences.
Contribution
It proposes a novel two-step preference modeling framework that includes context selection and evaluation, along with new datasets and experiments showing improved performance.
Findings
Context-aware reward models outperform non-contextual models.
Finetuned context-aware models surpass GPT-4 and Llama 3 70B on preference tasks.
Adding context improves the alignment of models with human preferences.
Abstract
While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsAdam · Label Smoothing · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections
