Triple Preference Optimization: Achieving Better Alignment using a   Single Step Optimization

Amir Saeidi; Shivanshu Verma; Aswin RRV; Kashif Rasul; Chitta Baral

arXiv:2405.16681·cs.CL·February 19, 2025·1 cites

Triple Preference Optimization: Achieving Better Alignment using a Single Step Optimization

Amir Saeidi, Shivanshu Verma, Aswin RRV, Kashif Rasul, Chitta Baral

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces Triple Preference Optimization (TPO), a novel one-step preference learning method that enhances reasoning and instruction-following in large language models, outperforming existing methods like DPO with less data.

Contribution

The paper proposes TPO, a new preference optimization technique that overcomes limitations of DPO, improving LLM alignment in reasoning and instruction-following tasks with a single-step approach.

Findings

01

TPO outperforms DPO and SimPO on multiple benchmarks.

02

TPO achieves up to 19.2% improvement on GSM8K.

03

TPO requires less data than DPO for comparable performance.

Abstract

Reinforcement Learning with Human Feedback (RLHF) enhances the alignment of Large Language Models (LLMs). However, its limitations have led to the development of Direct Preference Optimization (DPO), an RL-free approach designed to overcome these shortcomings. While studies have shown that DPO improves instruction-following capabilities, it negatively impacts the reasoning ability of LLMs. Additionally, DPO is highly sensitive to judgment noise in preference datasets and the size of the training set. Although several modifications to DPO have been proposed, they still fail to fully resolve these issues. To address these limitations, we propose Triple Preference Optimization (TPO), a new preference learning method designed to enhance both reasoning and instruction-following abilities through one-step optimization. We compare TPO against DPO and its recent variants using state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sahsaeedi/triple-preference-optimization
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms

MethodsDirect Preference Optimization · ALIGN · Shrink and Fine-Tune