Analyzing Language Bias Between French and English in Conventional Multilingual Sentiment Analysis Models
Ethan Parker Wong, Faten M'hiri

TL;DR
This study investigates language bias in multilingual sentiment analysis between French and English, revealing biases favoring French and assessing fairness using Fairlearn across different models and datasets.
Contribution
It provides an empirical analysis of language bias in sentiment classification models and evaluates fairness metrics, highlighting the need for equitable multilingual NLP systems.
Findings
French data outperforms English in accuracy, recall, and F1 scores.
Fairlearn indicates near-equity for SVM models across datasets.
Naive Bayes shows greater disparities in demographic parity ratios.
Abstract
Inspired by the 'Bias Considerations in Bilingual Natural Language Processing' report by Statistics Canada, this study delves into potential biases in multilingual sentiment analysis between English and French. Given a 50-50 dataset of French and English, we aim to determine if there exists a language bias and explore how the incorporation of more diverse datasets in the future might affect the equity of multilingual Natural Language Processing (NLP) systems. By employing Support Vector Machine (SVM) and Naive Bayes models on three balanced datasets, we reveal potential biases in multilingual sentiment classification. Utilizing Fairlearn, a tool for assessing bias in machine learning models, our findings indicate nuanced outcomes. With French data outperforming English across accuracy, recall, and F1 score metrics in both models, hinting at a language bias favoring French. However,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
