Data Collection and Analysis of French Dialects

Omar Shaur Choudhry; Paul Omara Odida; Joshua Reiner; Keiron; Appleyard; Danielle Kushnir; William Toon

arXiv:2208.00752·cs.CL·August 2, 2022

Data Collection and Analysis of French Dialects

Omar Shaur Choudhry, Paul Omara Odida, Joshua Reiner, Keiron, Appleyard, Danielle Kushnir, William Toon

PDF

Open Access 1 Repo

TL;DR

This paper presents the creation and analysis of a new French dialect dataset, applying machine learning classifiers to classify dialect samples and evaluating data mining techniques within the CRISP-DM framework.

Contribution

It introduces a new French dialect dataset and evaluates machine learning classifiers for dialect classification using a structured data mining approach.

Findings

01

Effective classifiers identified for dialect classification

02

Key features for distinguishing dialects determined

03

Data quality issues addressed and mitigated

Abstract

This paper discusses creating and analysing a new dataset for data mining and text analytics research, contributing to a joint Leeds University research project for the Corpus of National Dialects. This report investigates machine learning classifiers to classify samples of French dialect text across various French-speaking countries. Following the steps of the CRISP-DM methodology, this report explores the data collection process, data quality issues and data conversion for text analysis. Finally, after applying suitable data mining techniques, the evaluation methods, best overall features and classifiers and conclusions are discussed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

omariosc/Data-Collection-and-Analysis-of-French-Dialects
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques