From Polyester Girlfriends to Blind Mice: Creating the First Pragmatics Understanding Benchmarks for Slovene
Mojca Brglez, \v{S}pela Vintar

TL;DR
This paper introduces the first pragmatics understanding benchmarks for Slovene, assessing language models' ability to interpret nuanced, context-dependent, and culture-specific language features, highlighting current limitations and future challenges.
Contribution
It presents SloPragEval and SloPragMega, pioneering benchmarks for Slovene pragmatics, and evaluates model performance, emphasizing the importance of native data and human validation.
Findings
Models have improved in understanding nuanced language
Models struggle with implied speaker meaning in culture-specific contexts
Significant gap between proprietary and open-source models
Abstract
Large language models are demonstrating increasing capabilities, excelling at benchmarks once considered very difficult. As their capabilities grow, there is a need for more challenging evaluations that go beyond surface-level linguistic competence. Namely, language competence involves not only syntax and semantics but also pragmatics, i.e., understanding situational meaning as shaped by context as well as linguistic and cultural norms. To contribute to this line of research, we introduce SloPragEval and SloPragMega, the first pragmatics understanding benchmarks for Slovene that contain altogether 405 multiple-choice questions. We discuss the difficulties of translation, describe the campaign to establish a human baseline, and report pilot evaluations with LLMs. Our results indicate that current models have greatly improved in understanding nuanced language but may still fail to infer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
