Toward Honest Language Models for Deductive Reasoning

Jiarui Liu; Kaustubh Dhole; Yingheng Wang; Haoyang Wen; Sarah Zhang; Haitao Mao; Gaotang Li; Neeraj Varshney; Jingguo Liu; Xiaoman Pan

arXiv:2511.09222·cs.CL·December 1, 2025

Toward Honest Language Models for Deductive Reasoning

Jiarui Liu, Kaustubh Dhole, Yingheng Wang, Haoyang Wen, Sarah Zhang, Haitao Mao, Gaotang Li, Neeraj Varshney, Jingguo Liu, Xiaoman Pan

PDF

Open Access

TL;DR

This paper investigates how to make language models reason honestly by abstaining when conclusions are not entailed, proposing a reinforcement learning method that improves their ability to do so on graph-based deductive tasks.

Contribution

It introduces ACNCHOR, a reinforcement learning approach that stabilizes training and enhances honest deductive reasoning in language models, addressing limitations of existing methods.

Findings

01

Prompting and current training methods struggle with honest reasoning.

02

ACNCHOR stabilizes training and improves reasoning performance.

03

Ground truth trajectories prevent early training collapse.

Abstract

Deductive reasoning is the process of deriving conclusions strictly from the given premises, without relying on external knowledge. We define honesty in this setting as a model's ability to respond only when the conclusion is logically entailed by the premises, and to abstain otherwise. However, current language models often fail to reason honestly, producing unwarranted answers when the input is insufficient. To study this challenge, we formulate honest deductive reasoning as multi-step tasks where models must either derive the correct conclusion or abstain. We curate two datasets from graph structures, one for linear algebra and one for logical inference, and introduce unanswerable cases by randomly perturbing an edge in half of the instances. We find that prompting and existing training methods, including GRPO with or without supervised fine-tuning initialization, struggle on these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Ethics and Social Impacts of AI