TL;DR
This paper introduces QFrCoLA, a large Quebec-French linguistic acceptability dataset, and benchmarks various language models, revealing that fine-tuned Transformer models excel on this task while pre-trained cross-lingual models perform poorly.
Contribution
The creation of QFrCoLA, a new extensive dataset for Quebec French acceptability judgments, and its use to evaluate and compare language model capabilities.
Findings
Fine-tuned Transformer models outperform others on QFrCoLA.
Pre-trained cross-lingual models lack linguistic judgment skills in Quebec French.
QFrCoLA is a challenging benchmark for linguistic acceptability.
Abstract
Large and Transformer-based language models perform outstandingly in various downstream tasks. However, there is limited understanding regarding how these models internalize linguistic knowledge, so various linguistic benchmarks have recently been proposed to facilitate syntactic evaluation of language models across languages. This paper introduces QFrCoLA (Quebec-French Corpus of Linguistic Acceptability Judgments), a normative binary acceptability judgments dataset comprising 25,153 in-domain and 2,675 out-of-domain sentences. Our study leverages the QFrCoLA dataset and seven other linguistic binary acceptability judgment corpora to benchmark seven language models. The results demonstrate that, on average, fine-tuned Transformer-based LM are strong baselines for most languages and that zero-shot binary classification large language models perform poorly on the task. However, for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
