USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature   Engineering Strategies for Arabic Dialect Identification

Mohamed Lichouri; Khaled Lounnas; Aicha Zitouni; Houda Latrache,; Rachida Djeradi

arXiv:2312.10536·cs.CL·December 19, 2023·1 cites

USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature Engineering Strategies for Arabic Dialect Identification

Mohamed Lichouri, Khaled Lounnas, Aicha Zitouni, Houda Latrache,, Rachida Djeradi

PDF

Open Access

TL;DR

This paper analyzes preprocessing and feature engineering techniques for Arabic dialect identification, achieving an F1 score of 62.51%, and compares their impact on classification performance.

Contribution

It systematically evaluates the effects of various preprocessing and feature strategies on Arabic dialect identification performance.

Findings

01

Preprocessing and feature engineering significantly influence classification accuracy.

02

The system achieved an F1 score of 62.51%, close to the average of 72.91%.

03

Different feature combinations impact model performance variably.

Abstract

In this paper, we conduct an in-depth analysis of several key factors influencing the performance of Arabic Dialect Identification NADI'2023, with a specific focus on the first subtask involving country-level dialect identification. Our investigation encompasses the effects of surface preprocessing, morphological preprocessing, FastText vector model, and the weighted concatenation of TF-IDF features. For classification purposes, we employ the Linear Support Vector Classification (LSVC) model. During the evaluation phase, our system demonstrates noteworthy results, achieving an F1 score of 62.51%. This achievement closely aligns with the average F1 scores attained by other systems submitted for the first subtask, which stands at 72.91%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Speech Recognition and Synthesis

MethodsFocus · fastText