MultiPragEval: Multilingual Pragmatic Evaluation of Large Language   Models

Dojun Park; Jiwoo Lee; Seohyun Park; Hyeyun Jeong; Youngeun Koo,; Soonha Hwang; Seonwoo Park; Sungeun Lee

arXiv:2406.07736·cs.CL·October 1, 2024

MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models

Dojun Park, Jiwoo Lee, Seohyun Park, Hyeyun Jeong, Youngeun Koo,, Soonha Hwang, Seonwoo Park, Sungeun Lee

PDF

Open Access 1 Repo

TL;DR

MultiPragEval is a multilingual benchmark designed to evaluate large language models' understanding of pragmatic language use across four languages, revealing significant performance differences and advancing language comprehension assessment.

Contribution

This paper introduces the first multilingual pragmatic evaluation framework for LLMs, based on Grice's maxims, enabling comprehensive assessment of contextual and inferential language skills.

Findings

01

Claude3-Opus outperforms other models in all languages

02

Solar-10.7B and Qwen1.5-14B are strong open-source competitors

03

The benchmark provides insights into models' pragmatic inference capabilities

Abstract

As the capabilities of Large Language Models (LLMs) expand, it becomes increasingly important to evaluate them beyond basic knowledge assessment, focusing on higher-level language understanding. This study introduces MultiPragEval, the first multilingual pragmatic evaluation of LLMs, designed for English, German, Korean, and Chinese. Comprising 1200 question units categorized according to Grice's Cooperative Principle and its four conversational maxims, MultiPragEval enables an in-depth assessment of LLMs' contextual awareness and their ability to infer implied meanings. Our findings demonstrate that Claude3-Opus significantly outperforms other models in all tested languages, establishing a state-of-the-art in the field. Among open-source models, Solar-10.7B and Qwen1.5-14B emerge as strong competitors. By analyzing pragmatic inference, we provide valuable insights into the capabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DojunPark/MultiPragEval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques