Unravelling Interlanguage Facts via Explainable Machine Learning

Barbara Berti; Andrea Esuli; Fabrizio Sebastiani

arXiv:2208.01468·cs.CL·August 3, 2022

Unravelling Interlanguage Facts via Explainable Machine Learning

Barbara Berti, Andrea Esuli, Fabrizio Sebastiani

PDF

Open Access

TL;DR

This paper employs explainable machine learning to analyze how linguistic features reveal a speaker's native language and whether a text is written by a native or non-native speaker, providing insights into linguistic cues.

Contribution

It introduces an explainable ML approach to interpret NLI classifiers, identifying key linguistic traits that distinguish native languages and non-native writing.

Findings

01

Linguistic traits like lexical, morphological, and syntactic features are highly indicative of native language.

02

Explainable ML helps uncover which features classifiers rely on for NLI.

03

Case studies highlight specific traits for Spanish and Italian English learners.

Abstract

Native language identification (NLI) is the task of training (via supervised machine learning) a classifier that guesses the native language of the author of a text. This task has been extensively researched in the last decade, and the performance of NLI systems has steadily improved over the years. We focus on a different facet of the NLI task, i.e., that of analysing the internals of an NLI classifier trained by an \emph{explainable} machine learning algorithm, in order to obtain explanations of its classification decisions, with the ultimate goal of gaining insight into which linguistic phenomena ``give a speaker's native language away''. We use this perspective in order to tackle both NLI and a (much less researched) companion task, i.e., guessing whether a text has been written by a native or a non-native speaker. Using three datasets of different provenance (two datasets of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling