Pragmatic inference of scalar implicature by LLMs
Ye-eun Cho, Seong mook Kim

TL;DR
This paper examines how large language models like BERT and GPT-2 perform pragmatic inference of scalar implicature, revealing BERT's inherent understanding and GPT-2's context-dependent challenges in interpreting 'some' as 'not all'.
Contribution
It demonstrates that BERT naturally encodes pragmatic implicature, while GPT-2's inference depends on contextual cues, providing insights into their underlying pragmatic processing mechanisms.
Findings
BERT interprets 'some' as 'not all' without context.
GPT-2 struggles with pragmatic inference when context varies.
BERT's behavior aligns with the Default model of implicature.
Abstract
This study investigates how Large Language Models (LLMs), particularly BERT (Devlin et al., 2019) and GPT-2 (Radford et al., 2019), engage in pragmatic inference of scalar implicature, such as some. Two sets of experiments were conducted using cosine similarity and next sentence/token prediction as experimental methods. The results in experiment 1 showed that, both models interpret some as pragmatic implicature not all in the absence of context, aligning with human language processing. In experiment 2, in which Question Under Discussion (QUD) was presented as a contextual cue, BERT showed consistent performance regardless of types of QUDs, while GPT-2 encountered processing difficulties since a certain type of QUD required pragmatic inference for implicature. The findings revealed that, in terms of theoretical approaches, BERT inherently incorporates pragmatic implicature not all within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · WordPiece · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay · Cosine Annealing · Attention Is All You Need · Weight Decay · Adam
