dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted   Voting and TF-IDF Features

Mohamed Lichouri; Khaled Lounnas; Boualem Nadjib Zahaf; Mehdi Ayoub; Rabiai

arXiv:2407.13608·cs.CL·July 19, 2024

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

Mohamed Lichouri, Khaled Lounnas, Boualem Nadjib Zahaf, Mehdi Ayoub, Rabiai

PDF

Open Access 1 Video

TL;DR

This paper describes a multi-classifier ensemble approach using weighted voting and TF-IDF features for dialect identification, achieving high precision but low recall in a shared task setting.

Contribution

It introduces a simple ensemble method combining traditional classifiers and feature weighting strategies for dialect identification.

Findings

01

Achieved highest precision of 63.22% among participants.

02

F1 score was around 21%, with recall at 12.87%.

03

Ensemble approach demonstrated competitive performance despite simplicity.

Abstract

This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the challenge: in Experiment 1, we utilized a union of n-gram analyzers (word, character, character with word boundaries) with different n-gram values; in Experiment 2, we combined a weighted union of Term Frequency-Inverse Document Frequency (TF-IDF) features with various weights; and in Experiment 3, we implemented a weighted major voting scheme using three classifiers: Linear Support Vector Classifier (LSVC), Random Forest (RF), and K-Nearest Neighbors (KNN). Our approach, despite its simplicity and reliance on traditional machine learning techniques, demonstrated competitive performance in terms of F1-score and precision. Notably, we achieved the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features· underline

Taxonomy

TopicsData Stream Mining Techniques