Investigating a Benchmark for Training-set free Evaluation of Linguistic   Capabilities in Machine Reading Comprehension

Viktor Schlegel; Goran Nenadic; Riza Batista-Navarro

arXiv:2408.05023·cs.CL·August 12, 2024

Investigating a Benchmark for Training-set free Evaluation of Linguistic Capabilities in Machine Reading Comprehension

Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro

PDF

Open Access

TL;DR

This paper proposes a training-set free evaluation framework for machine reading comprehension using synthetically generated challenge sets, which can effectively assess linguistic capabilities without relying on large crowd-sourced datasets.

Contribution

It introduces a novel evaluation approach that uses synthetic challenge sets, addressing limitations of traditional crowd-sourced datasets in NLP model assessment.

Findings

01

Synthetic challenge sets match crowd-sourced data in naturalness and diversity.

02

State-of-the-art models can succeed on challenge sets without understanding the underlying phenomena.

03

The approach offers a scalable alternative for evaluating linguistic capabilities in MRC.

Abstract

Performance of NLP systems is typically evaluated by collecting a large-scale dataset by means of crowd-sourcing to train a data-driven model and evaluate it on a held-out portion of the data. This approach has been shown to suffer from spurious correlations and the lack of challenging examples that represent the diversity of natural language. Instead, we examine a framework for evaluating optimised models in training-set free setting on synthetically generated challenge sets. We find that despite the simplicity of the generation method, the data can compete with crowd-sourced datasets with regard to naturalness and lexical diversity for the purpose of evaluating the linguistic capabilities of MRC models. We conduct further experiments and show that state-of-the-art language model-based MRC systems can learn to succeed on the challenge set correctly, although, without capturing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning

MethodsSparse Evolutionary Training