When Does Meaning Backfire? Investigating the Role of AMRs in NLI

Junghyun Min; Xiulin Yang; Shira Wein

arXiv:2506.14613·cs.CL·September 26, 2025

When Does Meaning Backfire? Investigating the Role of AMRs in NLI

Junghyun Min, Xiulin Yang, Shira Wein

PDF

Open Access

TL;DR

This paper examines whether incorporating Abstract Meaning Representation (AMR) improves natural language inference (NLI) models, finding that AMR can hinder semantic understanding and instead amplify superficial differences, leading to potential misjudgments.

Contribution

It provides an empirical analysis of AMR's impact on NLI, revealing that AMR may not enhance semantic reasoning and can sometimes mislead models.

Findings

01

AMR integration in fine-tuning hampers model generalization.

02

Prompting with AMR yields slight improvements in GPT-4o.

03

AMR tends to amplify surface-level differences rather than semantic content.

Abstract

Natural Language Inference (NLI) relies heavily on adequately parsing the semantic content of the premise and hypothesis. In this work, we investigate whether adding semantic information in the form of an Abstract Meaning Representation (AMR) helps pretrained language models better generalize in NLI. Our experiments integrating AMR into NLI in both fine-tuning and prompting settings show that the presence of AMR in fine-tuning hinders model generalization while prompting with AMR leads to slight gains in GPT-4o. However, an ablation study reveals that the improvement comes from amplifying surface-level differences rather than aiding semantic reasoning. This amplification can mislead models to predict non-entailment even when the core meaning is preserved.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychology of Moral and Emotional Judgment · Topic Modeling · Deception detection and forensic psychology