How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models

Judith Sieker; Sina Zarrie{\ss}

arXiv:2604.15873·cs.CL·April 20, 2026

How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models

Judith Sieker, Sina Zarrie{\ss}

PDF

TL;DR

This paper investigates the alignment between large language models' abilities as pragmatic listeners and speakers, revealing a significant asymmetry where models excel as judges but not as generators, highlighting the need for integrated evaluation.

Contribution

It provides a comparative analysis of LLMs' pragmatic judgment and generation, revealing weak alignment and emphasizing the importance of combined evaluation methods.

Findings

01

Models perform better as pragmatic listeners than as speakers.

02

Pragmatic judgment and generation are only weakly correlated in LLMs.

03

Results suggest the need for more integrated pragmatic evaluation practices.

Abstract

Large language models (LLMs) are increasingly studied as repositories of linguistic knowledge. In this line of work, models are commonly evaluated both as generators of language and as judges of linguistic output, yet these two roles are rarely examined in direct relation to one another. As a result, it remains unclear whether success in one role aligns with success in the other. In this paper, we address this question for pragmatic competence by comparing LLMs' performance as pragmatic listeners, judging the appropriateness of linguistic outputs, and as pragmatic speakers, generating pragmatically appropriate language. We evaluate multiple open-weight and proprietary LLMs across three pragmatic settings. We find a robust asymmetry between pragmatic evaluation and pragmatic generation: many models perform substantially better as listeners than as speakers. Our results suggest that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.