ELQA: A Corpus of Metalinguistic Questions and Answers about English

Shabnam Behzad; Keisuke Sakaguchi; Nathan Schneider; Amir Zeldes

arXiv:2205.00395·cs.CL·July 4, 2023

ELQA: A Corpus of Metalinguistic Questions and Answers about English

Shabnam Behzad, Keisuke Sakaguchi, Nathan Schneider, Amir Zeldes

PDF

Open Access 1 Repo

TL;DR

ELQA is a large, metalinguistic corpus of questions and answers about English, designed to evaluate and enhance language understanding and educational tools in NLP.

Contribution

The paper introduces ELQA, a novel metalinguistic dataset from online forums, and evaluates LLMs' ability to generate language-about-language answers.

Findings

01

LLMs can produce metalinguistic answers with varying accuracy.

02

ELQA enables research on language understanding and educational applications.

03

The dataset supports the development of models with better metalinguistic capabilities.

Abstract

We present ELQA, a corpus of questions and answers in and about the English language. Collected from two online forums, the >70k questions (from English learners and others) cover wide-ranging topics including grammar, meaning, fluency, and etymology. The answers include descriptions of general properties of English vocabulary and grammar as well as explanations about specific (correct and incorrect) usage examples. Unlike most NLP datasets, this corpus is metalinguistic -- it consists of language about language. As such, it can facilitate investigations of the metalinguistic capabilities of NLU models, as well as educational applications in the language learning domain. To study this, we define a free-form question answering task on our dataset and conduct evaluations on multiple LLMs (Large Language Models) to analyze their capacity to generate metalinguistic answers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shabnam-b/elqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling