Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao; Wenxuan Ding; Shangbin Feng; Lucy Lu Wang; Yulia Tsvetkov

arXiv:2410.11055·cs.CL·October 16, 2024

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao, Wenxuan Ding, Shangbin Feng, Lucy Lu Wang, Yulia Tsvetkov

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper explores how large language models can be aligned to prefer certain wrong answers over others, revealing that such alignment can reduce errors and improve calibration even without reliable annotations.

Contribution

It introduces methods to elicit and utilize wrong-over-wrong preferences for aligning LLMs, demonstrating their effectiveness across multiple models and datasets.

Findings

01

LLMs can distinguish shades of wrong answers with up to 20.9% better than random.

02

Aligning with wrong-over-wrong preferences reduces errors and sometimes yields correct answers.

03

Alignment improves model calibration and understanding of wrong options.

Abstract

In the absence of abundant reliable annotations for challenging tasks and contexts, how can we expand the frontier of LLM capabilities with potentially wrong answers? We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options? And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful? We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences. Extensive experiments with seven LLMs and eight datasets demonstrate that (1) LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess; (2) Alignment with wrong-over-wrong preferences helps LLMs to produce less wrong and sometimes even…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

This paper introduces quite a novel approach to doing alignment in the absence of ground truth. The paper proposes an alignment methodology called wrong-over-wrong alignment. Such an alignment allows any LLM to learn the correctness of an answer (for the domain it is trained/aligned on), by simply distinguishing between varying shades of wrong. This is quite a valuable approach to solve alignment especially when human vetted groudth truth is missing, which is more often than not in any practical

Weaknesses

The biggest weakness in my opinion of the method presented in tihs paper is its reliance on proxy functions. While the main objective of the paper is to explore alignment without dependence on correct answers, the paper however relies on proxy functions to elicit wrongness of answers. The proxy methods come with their own limitations and I worry that in real world practice, they may inadvertently introduce biases or even inaccuracies. Authors themselves have called out a possible introduction of

Reviewer 02Rating 6Confidence 3

Strengths

* Creative approach that addresses alignment without needing correct answers. * Solid, methodologically rigorous results across diverse datasets, validated with multiple LLMs.

Weaknesses

* The results and analysis sections are presented in a "bullet point" format without logical transitions, making it feel like a collection of findings rather than a cohesive story. * Most tables, especially Table 1, are dense and challenging to interpret. Consider breaking them down, or move detailed tables to an appendix and keep summary statistics or visualizations. * In the experimental settings section, more context around the overall setup would help; for instance, the mention of multiple-

Reviewer 03Rating 6Confidence 4

Strengths

- The paper explores a novel concept of "wrong-over-wrong" alignment, presenting an approach that diverges from traditional correct-answer evaluations, especially in contexts where correct answers may be absent. - It raises important questions about how incorrect answers can still provide value in training and evaluation processes, potentially paving the way for further research in this area.

Weaknesses

1. The framing of the concept could be improved to clearly communicate the utility of wrong-over-wrong alignment. The lack of explicit applications or contexts weakens the paper's stance. For instance, theorem-proving and low-resource languages are mentioned but the paper does not carry out any experiments on the same. Authors mentioned in the results that knowledge-based tasks had better performance but these tasks inherently contain correct answers which diverge from the proposed applications.

Code & Models

Repositories

yaojh18/Varying-Shades-of-Wrong
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Artificial Intelligence in Law · Law, Economics, and Judicial Systems

MethodsFocus