Unravelling Interlanguage Facts via Explainable Machine Learning
Barbara Berti, Andrea Esuli, Fabrizio Sebastiani

TL;DR
This paper employs explainable machine learning to analyze how linguistic features reveal a speaker's native language and whether a text is written by a native or non-native speaker, providing insights into linguistic cues.
Contribution
It introduces an explainable ML approach to interpret NLI classifiers, identifying key linguistic traits that distinguish native languages and non-native writing.
Findings
Linguistic traits like lexical, morphological, and syntactic features are highly indicative of native language.
Explainable ML helps uncover which features classifiers rely on for NLI.
Case studies highlight specific traits for Spanish and Italian English learners.
Abstract
Native language identification (NLI) is the task of training (via supervised machine learning) a classifier that guesses the native language of the author of a text. This task has been extensively researched in the last decade, and the performance of NLI systems has steadily improved over the years. We focus on a different facet of the NLI task, i.e., that of analysing the internals of an NLI classifier trained by an \emph{explainable} machine learning algorithm, in order to obtain explanations of its classification decisions, with the ultimate goal of gaining insight into which linguistic phenomena ``give a speaker's native language away''. We use this perspective in order to tackle both NLI and a (much less researched) companion task, i.e., guessing whether a text has been written by a native or a non-native speaker. Using three datasets of different provenance (two datasets of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
