Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition

Tara Azin; Daniel Dumitrescu; Diana Inkpen; Raj Singh

arXiv:2506.06133·cs.CL·June 9, 2025

Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition

Tara Azin, Daniel Dumitrescu, Diana Inkpen, Raj Singh

PDF

Open Access

TL;DR

This paper introduces CONFER, a new dataset for evaluating NLI models on conditional inference and presupposition, revealing that current models struggle with these pragmatic inferences even after fine-tuning.

Contribution

The paper presents CONFER, a novel dataset specifically designed to assess NLI models' ability to handle conditional presuppositions, and evaluates various models' performance on this challenging task.

Findings

01

NLI models perform poorly on presuppositional reasoning in conditionals.

02

Fine-tuning on existing datasets does not significantly improve model performance.

03

Large Language Models show limited ability to infer presuppositions in zero-shot and few-shot settings.

Abstract

Natural Language Inference (NLI) is the task of determining whether a sentence pair represents entailment, contradiction, or a neutral relationship. While NLI models perform well on many inference tasks, their ability to handle fine-grained pragmatic inferences, particularly presupposition in conditionals, remains underexplored. In this study, we introduce CONFER, a novel dataset designed to evaluate how NLI models process inference in conditional sentences. We assess the performance of four NLI models, including two pre-trained models, to examine their generalization to conditional reasoning. Additionally, we evaluate Large Language Models (LLMs), including GPT-4o, LLaMA, Gemma, and DeepSeek-R1, in zero-shot and few-shot prompting settings to analyze their ability to infer presuppositions with and without prior context. Our findings indicate that NLI models struggle with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)