Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation   of Non-Literal Intent Resolution in LLMs

Akhila Yerukola; Saujas Vaduguru; Daniel Fried; Maarten Sap

arXiv:2405.08760·cs.CL·June 21, 2024

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs

Akhila Yerukola, Saujas Vaduguru, Daniel Fried, Maarten Sap

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a generative evaluation method for assessing large language models' ability to understand non-literal, pragmatic intentions in communication, revealing current limitations and potential improvements.

Contribution

It proposes a new generative approach to evaluate LLMs' understanding of non-literal language and demonstrates their struggles in pragmatic intention comprehension.

Findings

01

LLMs achieve 50-55% accuracy on non-literal intent responses.

02

Providing explicit intentions improves performance to around 75%.

03

Chain-of-thought prompts increase accuracy modestly to 60%.

Abstract

Humans often express their communicative intents indirectly or non-literally, which requires their interlocutors -- human or AI -- to understand beyond the literal meaning of words. While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models' (LLMs') intention understanding by examining their responses to non-literal utterances. Ideally, an LLM should respond in line with the true intention of a non-literal utterance, not its literal interpretation. Our findings show that LLMs struggle to generate pragmatically relevant responses to non-literal language, achieving only 50-55% accuracy on average. While explicitly providing oracle intentions significantly improves performance (e.g., 75% for Mistral-Instruct), this still indicates challenges in leveraging given intentions to produce appropriate responses.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Akhila-Yerukola/generative-intention-resolution
pytorchOfficial

Videos

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs· underline

Taxonomy

TopicsTaxation and Legal Issues · Theology and Canon Law Studies