Simple Linguistic Inferences of Large Language Models (LLMs): Blind   Spots and Blinds

Victoria Basmov; Yoav Goldberg; Reut Tsarfaty

arXiv:2305.14785·cs.CL·April 12, 2024·5 cites

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds

Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

PDF

Open Access

TL;DR

This paper evaluates large language models' ability to perform simple linguistic inference tasks, revealing significant blind spots where models struggle with entailments and the influence of syntactic structures, despite their advanced language understanding.

Contribution

It introduces targeted evaluation sets for simple inference tasks and demonstrates that LLMs have notable blind spots influenced by syntactic embedding and presupposition triggers.

Findings

01

Models show moderate to low performance on inference tasks.

02

Embedding in certain syntactic structures confuses models and affects entailment predictions.

03

Even strong LLMs have blind spots regarding specific entailment types.

Abstract

We evaluate LLMs' language understanding capacities on simple inference tasks that most humans find trivial. Specifically, we target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. We design evaluation sets for these tasks and conduct experiments in both zero-shot and chain-of-thought setups, and with multiple prompts and LLMs. The models exhibit moderate to low performance on these evaluation sets. Subsequent experiments show that embedding the premise in syntactic constructions that should preserve the entailment relations (presupposition triggers) or change them (non-factives), further confuses the models, causing them to either under-predict or over-predict certain entailment labels regardless of the true relation, and often disregarding the nature of the embedding context. Overall these results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Explainable Artificial Intelligence (XAI)