Self-Consistency Preference Optimization
Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu

TL;DR
This paper introduces Self-Consistency Preference Optimization (ScPO), a novel training method that enhances model reasoning accuracy by iteratively training models to prefer consistent answers, significantly improving performance on reasoning benchmarks.
Contribution
The paper extends the self-consistency concept from inference to training, proposing ScPO to improve reasoning tasks without supervised labels, achieving results comparable to supervised methods.
Findings
ScPO improves reasoning accuracy on GSM8K and MATH benchmarks.
Combining ScPO with supervised learning yields further performance gains.
ScPO fine-tunes Llama-3 8B to outperform larger models like Llama-3 70B.
Abstract
Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area. However, existing techniques often fail to improve complex reasoning tasks due to the difficulty of assigning correct rewards. An orthogonal approach that is known to improve correctness is self-consistency, a method applied at inference time based on multiple sampling in order to find the most consistent answer. In this work, we extend the self-consistency concept to help train models. We thus introduce self-consistency preference optimization (ScPO), which iteratively trains consistent answers to be preferred over inconsistent ones on unsupervised new problems. We show ScPO leads to large improvements over conventional reward model training on reasoning tasks such as GSM8K and MATH, closing the gap with supervised training with gold answers or preferences, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCollaboration in agile enterprises · Service-Oriented Architecture and Web Services · Advanced Software Engineering Methodologies
