The interplay between language similarity and script on a novel   multi-layer Algerian dialect corpus

Samia Touileb; Jeremy Barnes

arXiv:2105.07400·cs.CL·June 1, 2021

The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

Samia Touileb, Jeremy Barnes

PDF

1 Repo

TL;DR

This paper investigates how language similarity and script differences affect cross-lingual transfer in NLP tasks using a new Algerian dialect corpus with multiple scripts, revealing a complex relationship especially for POS tagging.

Contribution

It introduces a novel multi-layer Algerian dialect corpus with parallel annotations across scripts and explores the impact of script and typological similarity on transfer performance.

Findings

01

Script and typology influence POS transfer differently

02

Sentiment analysis is less affected by script and typology differences

03

Fine-tuning multilingual models reveals nuanced effects of language and script

Abstract

Recent years have seen a rise in interest for cross-lingual transfer between languages with similar typology, and between languages of various scripts. However, the interplay between language similarity and difference in script on cross-lingual transfer is a less studied problem. We explore this interplay on cross-lingual transfer for two supervised tasks, namely part-of-speech tagging and sentiment analysis. We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts, as well as annotations for sentiment and topic categories. We perform baseline experiments by fine-tuning multi-lingual language models. We further explore the effect of script vs. language similarity in cross-lingual transfer by fine-tuning multi-lingual models on languages which are a) typologically distinct,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SamiaTouileb/Narabizi
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.