A linguistically-motivated evaluation methodology for unraveling model's   abilities in reading comprehension tasks

Elie Antoine (LIS; TALEP); Fr\'ed\'eric B\'echet (LIS; TALEP),; G\'eraldine Damnati; Philippe Langlais (DIRO)

arXiv:2501.17569·cs.CL·January 30, 2025

A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks

Elie Antoine (LIS, TALEP), Fr\'ed\'eric B\'echet (LIS, TALEP),, G\'eraldine Damnati, Philippe Langlais (DIRO)

PDF

Open Access 1 Video

TL;DR

This paper proposes a linguistically-motivated evaluation method for reading comprehension models, using semantic frame annotation to identify linguistic complexities that challenge models regardless of size or architecture.

Contribution

It introduces a novel evaluation approach based on semantic complexity factors, validated on French and English benchmarks, to better understand model limitations in reading comprehension.

Findings

01

Semantic complexity factors predict model failures.

02

Fine-grained automatic evaluation reveals specific linguistic challenges.

03

State-of-the-art models struggle with certain linguistic features.

Abstract

We introduce an evaluation methodology for reading comprehension tasks based on the intuition that certain examples, by the virtue of their linguistic complexity, consistently yield lower scores regardless of model size or architecture. We capitalize on semantic frame annotation for characterizing this complexity, and study seven complexity factors that may account for model's difficulty. We first deploy this methodology on a carefully annotated French reading comprehension benchmark showing that two of those complexity factors are indeed good predictors of models' failure, while others are less so. We further deploy our methodology on a well studied English benchmark by using Chat-GPT as a proxy for semantic annotation. Our study reveals that fine-grained linguisticallymotivated automatic evaluation of a reading comprehension task is not only possible, but helps understand models'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning