MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models
Dojun Park, Jiwoo Lee, Seohyun Park, Hyeyun Jeong, Youngeun Koo,, Soonha Hwang, Seonwoo Park, Sungeun Lee

TL;DR
MultiPragEval is a multilingual benchmark designed to evaluate large language models' understanding of pragmatic language use across four languages, revealing significant performance differences and advancing language comprehension assessment.
Contribution
This paper introduces the first multilingual pragmatic evaluation framework for LLMs, based on Grice's maxims, enabling comprehensive assessment of contextual and inferential language skills.
Findings
Claude3-Opus outperforms other models in all languages
Solar-10.7B and Qwen1.5-14B are strong open-source competitors
The benchmark provides insights into models' pragmatic inference capabilities
Abstract
As the capabilities of Large Language Models (LLMs) expand, it becomes increasingly important to evaluate them beyond basic knowledge assessment, focusing on higher-level language understanding. This study introduces MultiPragEval, the first multilingual pragmatic evaluation of LLMs, designed for English, German, Korean, and Chinese. Comprising 1200 question units categorized according to Grice's Cooperative Principle and its four conversational maxims, MultiPragEval enables an in-depth assessment of LLMs' contextual awareness and their ability to infer implied meanings. Our findings demonstrate that Claude3-Opus significantly outperforms other models in all tested languages, establishing a state-of-the-art in the field. Among open-source models, Solar-10.7B and Qwen1.5-14B emerge as strong competitors. By analyzing pragmatic inference, we provide valuable insights into the capabilities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
