Binary Classifier Optimization for Large Language Model Alignment

Seungjae Jung; Gunsoo Han; Daniel Wontae Nam; Kyoung-Woon On

arXiv:2404.04656·cs.LG·June 10, 2025·3 cites

Binary Classifier Optimization for Large Language Model Alignment

Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

PDF

Open Access 3 Models

TL;DR

This paper introduces Binary Classifier Optimization (BCO), a novel method for aligning large language models using only simple binary feedback, which performs comparably to existing preference-based methods across multiple datasets and models.

Contribution

The paper presents BCO, a new approach that effectively aligns LLMs using only binary feedback, with theoretical insights and practical validation on diverse datasets.

Findings

01

BCO performs on par with DPO on preference datasets

02

BCO demonstrates robustness across different models and datasets

03

Binary cross-entropy acts as an upper bound for DPO loss

Abstract

In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies

MethodsDirect Preference Optimization · Balanced Selection