Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs
Abinitha Gourabathina, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Prasanna Sattigeri

TL;DR
This paper introduces Trace Inversion, a novel abstention method for LLMs that detects when models answer the wrong question by comparing initial and reconstructed queries, significantly improving abstention accuracy.
Contribution
The paper proposes the Query Misalignment Framework and Trace Inversion technique, a new approach that enhances LLM abstention by analyzing reasoning traces to identify incorrect answers.
Findings
Trace Inversion outperforms baselines in 33 of 36 settings.
It improves abstention performance across four large language models.
The method effectively detects wrong-question answers using reasoning trace comparison.
Abstract
For Large Language Models (LLMs) to be reliably deployed, models must effectively know when not to answer: abstain. Reasoning models, in particular, have gained attention for impressive performance on complex tasks. However, reasoning models have been shown to have worse abstention abilities. Taking the vulnerabilities of reasoning models into account, we propose our Query Misalignment Framework. Hallucinations resulting in failed abstention can be reinterpreted as LLMs answering the wrong question (rather than answering a question incorrectly). Based on this framework, we develop a new class of state-of-the-art abstention methods called Trace Inversion. First, we generate the reasoning trace of a model. Based on only the trace, we then reconstruct the most likely query that the model responded to. Finally, we compare the initial query with the reconstructed query. Low similarity score…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
