As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative   Feedback Loss

Xin Mao; Feng-Lin Li; Huimin Xu; Wei Zhang; Wang Chen; Anh Tuan Luu

arXiv:2410.04834·cs.CL·October 28, 2024

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Xin Mao, Feng-Lin Li, Huimin Xu, Wei Zhang, Wang Chen, Anh Tuan Luu

PDF

Open Access

TL;DR

This paper introduces a simple, hyper-parameter-free bidirectional negative feedback loss for aligning large language models, improving stability and performance on reasoning tasks while maintaining efficiency.

Contribution

It proposes a novel BNF loss that simplifies LLM alignment by removing the need for pairwise data and hyper-parameter tuning, enhancing stability and reasoning ability.

Findings

01

BNF achieves comparable QA performance to state-of-the-art methods.

02

BNF shows significantly less performance decline on reasoning benchmarks.

03

Extensive experiments validate BNF's effectiveness and stability.

Abstract

Direct Preference Optimization (DPO) has emerged as a more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO), eliminating the need for reward models and online sampling. Despite these benefits, DPO and its variants remain sensitive to hyper-parameters and prone to instability, particularly on mathematical datasets. We argue that these issues arise from the unidirectional likelihood-derivative negative feedback inherent in the log-likelihood loss function. To address this, we propose a novel LLM alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization. Our proposed BNF loss eliminates the need for pairwise contrastive losses and does not require any extra tunable hyper-parameters or pairwise preference data, streamlining the alignment pipeline to be as simple as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Learning Control Systems

MethodsDirect Preference Optimization