As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Feng-Lin Li, Huimin Xu, Wei Zhang, Wang Chen, Anh Tuan Luu

TL;DR
This paper introduces a simple, hyper-parameter-free bidirectional negative feedback loss for aligning large language models, improving stability and performance on reasoning tasks while maintaining efficiency.
Contribution
It proposes a novel BNF loss that simplifies LLM alignment by removing the need for pairwise data and hyper-parameter tuning, enhancing stability and reasoning ability.
Findings
BNF achieves comparable QA performance to state-of-the-art methods.
BNF shows significantly less performance decline on reasoning benchmarks.
Extensive experiments validate BNF's effectiveness and stability.
Abstract
Direct Preference Optimization (DPO) has emerged as a more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO), eliminating the need for reward models and online sampling. Despite these benefits, DPO and its variants remain sensitive to hyper-parameters and prone to instability, particularly on mathematical datasets. We argue that these issues arise from the unidirectional likelihood-derivative negative feedback inherent in the log-likelihood loss function. To address this, we propose a novel LLM alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization. Our proposed BNF loss eliminates the need for pairwise contrastive losses and does not require any extra tunable hyper-parameters or pairwise preference data, streamlining the alignment pipeline to be as simple as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems
MethodsDirect Preference Optimization
