Multi-Reference Preference Optimization for Large Language Models
Hung Le, Quan Tran, Dung Nguyen, Kien Do, Saloni Mittal, Kelechi, Ogueji, Svetha Venkatesh

TL;DR
This paper introduces MRPO, a novel method for aligning large language models with human preferences by leveraging multiple reference models, leading to improved generalization and performance across various NLP tasks.
Contribution
The paper presents a new closed-form preference optimization method that incorporates multiple reference models, enhancing preference learning over existing single-reference approaches.
Findings
MRPO outperforms single-reference DPO in preference generalization.
LLMs finetuned with MRPO excel in tasks like GSM8K and TruthfulQA.
MRPO demonstrates robustness across different data availability scenarios.
Abstract
How can Large Language Models (LLMs) be aligned with human intentions and values? A typical solution is to gather human preference on model outputs and finetune the LLMs accordingly while ensuring that updates do not deviate too far from a reference model. Recent approaches, such as direct preference optimization (DPO), have eliminated the need for unstable and sluggish reinforcement learning optimization by introducing close-formed supervised losses. However, a significant limitation of the current approach is its design for a single reference model only, neglecting to leverage the collective power of numerous pretrained LLMs. To overcome this limitation, we introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsDirect Preference Optimization
