Binary Classifier Optimization for Large Language Model Alignment
Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

TL;DR
This paper introduces Binary Classifier Optimization (BCO), a novel method for aligning large language models using only simple binary feedback, which performs comparably to existing preference-based methods across multiple datasets and models.
Contribution
The paper presents BCO, a new approach that effectively aligns LLMs using only binary feedback, with theoretical insights and practical validation on diverse datasets.
Findings
BCO performs on par with DPO on preference datasets
BCO demonstrates robustness across different models and datasets
Binary cross-entropy acts as an upper bound for DPO loss
Abstract
In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsDirect Preference Optimization · Balanced Selection
