Monolingual and Cross-Lingual Acceptability Judgments with the Italian   CoLA corpus

Daniela Trotta; Raffaele Guarasci; Elisa Leonardelli; Sara Tonelli

arXiv:2109.12053·cs.CL·October 14, 2022

Monolingual and Cross-Lingual Acceptability Judgments with the Italian CoLA corpus

Daniela Trotta, Raffaele Guarasci, Elisa Leonardelli, Sara Tonelli

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces the ItaCoLA corpus for Italian acceptability judgments, enabling linguistic acceptability research beyond English and exploring cross-lingual transformer-based approaches.

Contribution

It creates and describes a new Italian acceptability corpus, enabling cross-lingual studies and benchmarking for non-English languages.

Findings

01

In-domain and out-of-domain classification performance analyzed

02

Evaluation of nine linguistic phenomena included

03

Cross-lingual fine-tuning shows potential benefits

Abstract

The development of automated approaches to linguistic acceptability has been greatly fostered by the availability of the English CoLA corpus, which has also been included in the widely used GLUE benchmark. However, this kind of research for languages other than English, as well as the analysis of cross-lingual approaches, has been hindered by the lack of resources with a comparable size in other languages. We have therefore developed the ItaCoLA corpus, containing almost 10,000 sentences with acceptability judgments, which has been created following the same approach and the same steps as the English one. In this paper we describe the corpus creation, we detail its content, and we present the first experiments on this new resource. We compare in-domain and out-of-domain classification, and perform a specific evaluation of nine linguistic phenomena. We also present the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dhfbk/itacola-dataset
noneOfficial

Datasets

gsarti/itacola
dataset· 45 dl
45 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCOLA