Multi-Response Preference Optimization with Augmented Ranking Dataset
Hansle Gwon, Imjin Ahn, Young-Hak Kim, Sanghyun Park, Tae Joon Jun

TL;DR
This paper introduces a novel dataset augmentation method and a multi-response training approach for preference optimization in LLMs, enhancing their ability to learn from multiple human preferences simultaneously.
Contribution
It proposes a new dataset augmentation technique and a multi-response training method for preference optimization, addressing dataset quality sensitivity and enabling multi-response learning.
Findings
Improved performance in preference optimization tasks.
Effective learning of multiple responses simultaneously.
Enhanced dataset robustness and quality.
Abstract
Recent advancements in Large Language Models (LLMs) have been remarkable, with new models consistently surpassing their predecessors. These advancements are underpinned by extensive research on various training mechanisms. Among these, Preference Optimization has played a significant role in improving the performance of LLMs by incorporating human preferences into the training process. However, constructing preference optimization datasets is challenging and the optimization process is highly sensitive to the dataset quality. In this study, we propose a novel approach to augment Preference Optimization datasets. Additionally, we introduce a Multi-response-based Preference Optimization training method that enables the simultaneous learning of multiple responses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology and Data Analysis · Internet of Things and Social Network Interactions · Korean Urban and Social Studies
