Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng, Lucy Lu Wang, Yulia Tsvetkov

TL;DR
This paper explores how large language models can be aligned to prefer certain wrong answers over others, revealing that such alignment can reduce errors and improve calibration even without reliable annotations.
Contribution
It introduces methods to elicit and utilize wrong-over-wrong preferences for aligning LLMs, demonstrating their effectiveness across multiple models and datasets.
Findings
LLMs can distinguish shades of wrong answers with up to 20.9% better than random.
Aligning with wrong-over-wrong preferences reduces errors and sometimes yields correct answers.
Alignment improves model calibration and understanding of wrong options.
Abstract
In the absence of abundant reliable annotations for challenging tasks and contexts, how can we expand the frontier of LLM capabilities with potentially wrong answers? We focus on two research questions: (1) Can LLMs generate reliable preferences among wrong options? And if so, (2) Would alignment with such wrong-over-wrong preferences be helpful? We employ methods based on self-consistency, token probabilities, and LLM-as-a-judge to elicit wrong-over-wrong preferences, and fine-tune language models with preference optimization approaches using these synthesized preferences. Extensive experiments with seven LLMs and eight datasets demonstrate that (1) LLMs do have preliminary capability in distinguishing various shades of wrong, achieving up to 20.9% higher performance than random guess; (2) Alignment with wrong-over-wrong preferences helps LLMs to produce less wrong and sometimes even…
Peer Reviews
Decision·ICLR 2025 Poster
This paper introduces quite a novel approach to doing alignment in the absence of ground truth. The paper proposes an alignment methodology called wrong-over-wrong alignment. Such an alignment allows any LLM to learn the correctness of an answer (for the domain it is trained/aligned on), by simply distinguishing between varying shades of wrong. This is quite a valuable approach to solve alignment especially when human vetted groudth truth is missing, which is more often than not in any practical
The biggest weakness in my opinion of the method presented in tihs paper is its reliance on proxy functions. While the main objective of the paper is to explore alignment without dependence on correct answers, the paper however relies on proxy functions to elicit wrongness of answers. The proxy methods come with their own limitations and I worry that in real world practice, they may inadvertently introduce biases or even inaccuracies. Authors themselves have called out a possible introduction of
* Creative approach that addresses alignment without needing correct answers. * Solid, methodologically rigorous results across diverse datasets, validated with multiple LLMs.
* The results and analysis sections are presented in a "bullet point" format without logical transitions, making it feel like a collection of findings rather than a cohesive story. * Most tables, especially Table 1, are dense and challenging to interpret. Consider breaking them down, or move detailed tables to an appendix and keep summary statistics or visualizations. * In the experimental settings section, more context around the overall setup would help; for instance, the mention of multiple-
- The paper explores a novel concept of "wrong-over-wrong" alignment, presenting an approach that diverges from traditional correct-answer evaluations, especially in contexts where correct answers may be absent. - It raises important questions about how incorrect answers can still provide value in training and evaluation processes, potentially paving the way for further research in this area.
1. The framing of the concept could be improved to clearly communicate the utility of wrong-over-wrong alignment. The lack of explicit applications or contexts weakens the paper's stance. For instance, theorem-proving and low-resource languages are mentioned but the paper does not carry out any experiments on the same. Authors mentioned in the results that knowledge-based tasks had better performance but these tasks inherently contain correct answers which diverge from the proposed applications.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property · Artificial Intelligence in Law · Law, Economics, and Judicial Systems
MethodsFocus
