Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages
Khalid Alnajjar, Mika H\"am\"al\"ainen, Jack Rueter

TL;DR
This paper introduces a method for sentiment analysis on endangered Uralic languages by translating and aligning word embeddings from a majority language and applying a neural network trained on English data.
Contribution
The paper presents a novel approach to adapt sentiment analysis models to endangered languages using aligned word embeddings and a neural network trained on English.
Findings
Achieved at least 56% accuracy on sentiment analysis for each endangered language.
Developed a translation and alignment method for word embeddings across languages.
Released models and sentiment corpus for future research.
Abstract
In this paper, we present an approach for translating word embeddings from a majority language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings. To test our model, we annotated a small sentiment analysis corpus for the 4 endangered languages and Finnish. Our method reached at least 56\% accuracy for each endangered language. The models and the sentiment corpus will be released together with this paper. Our research shows that state-of-the-art neural models can be used with endangered languages with the only requirement being a dictionary between the endangered language and a majority language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Linguistics and Cultural Studies
MethodsTest · ALIGN
