ContraSolver: Self-Alignment of Language Models by Resolving Internal   Preference Contradictions

Xu Zhang; Xunjian Yin; Xiaojun Wan

arXiv:2406.08842·cs.CL·June 14, 2024

ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Xu Zhang, Xunjian Yin, Xiaojun Wan

PDF

Open Access

TL;DR

ContraSolver is an unsupervised algorithm that improves large language models by identifying and resolving internal preference contradictions through a self-annotated preference graph, leading to better alignment.

Contribution

The paper introduces ContraSolver, a novel method for self-aligning LLMs by detecting and resolving preference contradictions without supervision.

Findings

01

Significant performance improvements on four generation tasks.

02

Reduction in preference contradictions after self-alignment.

03

Quantitative evidence of better alignment quality.

Abstract

While substantial advancements have been made in developing large language models (LLMs), achieving control over their behavior can be difficult. Direct preference optimization (DPO) assumes the existence of a latent reward function to evaluate the responses of LLMs. This assumption indicates a strict preference ordering of different responses to the same input. However, there always exist contradictions of preference in LLMs according to our experimental observations. In this paper, we construct a graph structure of the preference relationship among different responses with self-annotation to find contradictions in the preference order. We propose ContraSolver, an algorithm that traverses all edges on the preference graph to identify those that might cause contradictions. ContraSolver initializes the graph with a maximum spanning tree and identifies contradictory edges, prioritizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation