TL;DR
This study explores sentiment analysis for Jopara, a code-switching language, highlighting data collection challenges and comparing neural and traditional models, with transformers performing best despite limited Guarani data.
Contribution
It provides the first corpus of Guarani-dominant tweets and evaluates neural versus traditional models for low-resource Jopara sentiment analysis.
Findings
Transformers outperform traditional models in Jopara sentiment analysis.
Traditional machine learning models perform close to neural models in low-resource settings.
Data collection for low-resource languages remains a significant challenge.
Abstract
This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Dropout · Softmax · Layer Normalization · Label Smoothing
