$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical   Preference Optimization

Long Tan Le; Han Shu; Tung-Anh Nguyen; Choong Seon Hong; Nguyen H.; Tran

arXiv:2405.15230·cs.AI·October 30, 2024

$i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization

Long Tan Le, Han Shu, Tung-Anh Nguyen, Choong Seon Hong, Nguyen H., Tran

PDF

Open Access

TL;DR

The paper introduces $i$REPO, a novel alignment framework for large language models that uses implicit reward difference regression with self-generated data, improving alignment and outperforming existing methods.

Contribution

The paper proposes a new preference optimization method called $i$REPO that leverages implicit reward pairwise difference regression and theoretical guarantees for better LLM alignment.

Findings

01

$i$REPO outperforms baseline preference optimization methods.

02

Effective self-alignment using self-generated responses and AI annotator logits.

03

Theoretical guarantees for optimality and practical performance-gap analysis.

Abstract

While astonishingly capable, large Language Models (LLM) can sometimes produce outputs that deviate from human expectations. Such deviations necessitate an alignment phase to prevent disseminating untruthful, toxic, or biased information. Traditional alignment methods based on reinforcement learning often struggle with the identified instability, whereas preference optimization methods are limited by their overfitting to pre-collected hard-label datasets. In this paper, we propose a novel LLM alignment framework named $i$ REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization. Particularly, $i$ REPO employs self-generated datasets labeled by empirical human (or AI annotator) preference to iteratively refine the aligned policy through a novel regression-based loss function. Furthermore, we introduce an innovative algorithm backed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making