Plug-and-Play Training Framework for Preference Optimization

Jingyuan Ma; Rui Li; Zheng Li; Lei Sha; Zhifang Sui

arXiv:2412.20996·cs.CL·December 31, 2024

Plug-and-Play Training Framework for Preference Optimization

Jingyuan Ma, Rui Li, Zheng Li, Lei Sha, Zhifang Sui

PDF

Open Access

TL;DR

This paper introduces a flexible training framework that enhances preference optimization for large language models by weighting samples based on difficulty, leading to improved performance in complex tasks like mathematical reasoning.

Contribution

It proposes a novel plug-and-play training method that dynamically weights training samples, addressing sample difficulty and improving preference optimization effectiveness.

Findings

01

Improved accuracy in mathematical reasoning tasks.

02

Seamless integration with existing preference optimization methods.

03

Enhanced learning efficiency for large language models.

Abstract

Recently, preference optimization methods such as DPO have significantly enhanced large language models (LLMs) in wide tasks including dialogue and question-answering. However, current methods fail to account for the varying difficulty levels of training samples during preference optimization, leading to mediocre performance in tasks with high accuracy requirements, particularly in mathematical reasoning. To address this limitation, we propose a novel training framework, which employs multiple sampling to analyze output distributions, assign different weights to samples, and incorporate these weights into the preference optimization process. This plug-and-play approach enables LLMs to prioritize challenging examples during training, improving learning efficiency. Experimental results demonstrate that our framework integrates seamlessly with various preference optimization methods and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Robotics and Automated Systems

MethodsDirect Preference Optimization