To Know or Not To Know? Analyzing Self-Consistency of Large Language   Models under Ambiguity

Anastasiia Sedova; Robert Litschko; Diego Frassinelli; Benjamin Roth,; Barbara Plank

arXiv:2407.17125·cs.CL·October 7, 2024

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity

Anastasiia Sedova, Robert Litschko, Diego Frassinelli, Benjamin Roth,, Barbara Plank

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper evaluates how large language models handle ambiguous entities, revealing they often struggle with consistency and exhibit biases, which impacts their trustworthiness and reliability.

Contribution

It introduces an evaluation protocol to disentangle knowledge possession from application, and assesses state-of-the-art LLMs on entity ambiguity, highlighting their self-inconsistencies.

Findings

01

LLMs achieve only 85% accuracy on ambiguous entities

02

Performance drops to 75% with underspecified prompts

03

Models show biases and struggle with consistent application of knowledge

Abstract

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anasedova/toknow_or_nottoknow
noneOfficial

Videos

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods

MethodsFocus