Multi-Task Sequence Prediction For Tunisian Arabizi Multi-Level Annotation
Elisa Gugliotta (1,2,3), Marco Dinarelli (2), Olivier Kraif (3) ((1), Sapienza University of Rome, (2) Universit\'e Grenoble Alpes - Laboratoire, LIG (Getalp group), (3) Universit\'e Grenoble Alpes- Laboratoire LIDILEM)

TL;DR
This paper introduces a multi-task neural system for annotating Tunisian Arabizi text across multiple levels, demonstrating its effectiveness on both German and Tunisian datasets, and leveraging the Fairseq framework for implementation.
Contribution
The paper presents a novel multi-task sequence prediction system for Tunisian Arabizi annotation, combining multiple annotation tasks in a cascade using recurrent neural networks.
Findings
Effective multi-task learning on German corpus
Successful annotation and correction of Tunisian Arabizi corpus
System built on Fairseq framework for flexibility
Abstract
In this paper we propose a multi-task sequence prediction system, based on recurrent neural networks and used to annotate on multiple levels an Arabizi Tunisian corpus. The annotation performed are text classification, tokenization, PoS tagging and encoding of Tunisian Arabizi into CODA* Arabic orthography. The system is learned to predict all the annotation levels in cascade, starting from Arabizi input. We evaluate the system on the TIGER German corpus, suitably converting data to have a multi-task problem, in order to show the effectiveness of our neural architecture. We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data. Our system is developed for the Fairseq framework, which allows for a fast and easy use for any other sequence prediction problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
