On the logistical difficulties and findings of Jopara Sentiment Analysis

Marvin M. Ag\"uero-Torales; David Vilares; Antonio G. L\'opez-Herrera

arXiv:2105.02947·cs.CL·May 12, 2021

On the logistical difficulties and findings of Jopara Sentiment Analysis

Marvin M. Ag\"uero-Torales, David Vilares, Antonio G. L\'opez-Herrera

PDF

1 Repo

TL;DR

This study explores sentiment analysis for Jopara, a code-switching language, highlighting data collection challenges and comparing neural and traditional models, with transformers performing best despite limited Guarani data.

Contribution

It provides the first corpus of Guarani-dominant tweets and evaluates neural versus traditional models for low-resource Jopara sentiment analysis.

Findings

01

Transformers outperform traditional models in Jopara sentiment analysis.

02

Traditional machine learning models perform close to neural models in low-resource settings.

03

Data collection for low-resource languages remains a significant challenge.

Abstract

This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mmaguero/josa-corpus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Dropout · Softmax · Layer Normalization · Label Smoothing