WEKA-Based: Key Features and Classifier for French of Five Countries

Zeqian Li; Keyu Qiu; Chenxu Jiao; Wen Zhu; Haoran Tang

arXiv:2212.08132·cs.CL·December 19, 2022

WEKA-Based: Key Features and Classifier for French of Five Countries

Zeqian Li, Keyu Qiu, Chenxu Jiao, Wen Zhu, Haoran Tang

PDF

Open Access

TL;DR

This paper presents a dialect recognition system for French across five regions, utilizing a corpus and machine learning tools to distinguish regional variations based on thematic content.

Contribution

It introduces a French dialect classification approach using a new regional corpus and machine learning with WEKA, tailored for regional dialect identification.

Findings

01

Effective differentiation of regional French dialects achieved

02

Utilized WEKA classifiers with thematic corpus for dialect recognition

03

Demonstrated feasibility of machine learning in dialect classification

Abstract

This paper describes a French dialect recognition system that will appropriately distinguish between different regional French dialects. A corpus of five regions - Monaco, French-speaking, Belgium, French-speaking Switzerland, French-speaking Canada and France, which is targeted forconstruction by the Sketch Engine. The content of the corpus is related to the four themes of eating, drinking, sleeping and living, which are closely linked to popular life. The experimental results were obtained through the processing of a python coded pre-processor and Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many filters and classifiers for machine learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques