LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment   Analysis of Hinglish Social Media Text

Pranaydeep Singh; Els Lefever

arXiv:2010.11019·cs.CL·October 22, 2020

LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text

Pranaydeep Singh, Els Lefever

PDF

TL;DR

This paper explores cross-lingual embedding techniques for sentiment analysis of Hinglish social media text, comparing embedding projection and incremental retraining methods, with the latter achieving the best performance.

Contribution

It introduces and evaluates two novel approaches for Hinglish sentiment analysis using cross-lingual embeddings, demonstrating the effectiveness of incremental retraining.

Findings

01

Incremental retraining of embeddings yields higher F1-score.

02

Cross-lingual embedding projection is less effective.

03

Achieved 70.52% F1-score on test data.

Abstract

This paper describes our contribution to the SemEval-2020 Task 9 on Sentiment Analysis for Code-mixed Social Media Text. We investigated two approaches to solve the task of Hinglish sentiment analysis. The first approach uses cross-lingual embeddings resulting from projecting Hinglish and pre-trained English FastText word embeddings in the same space. The second approach incorporates pre-trained English embeddings that are incrementally retrained with a set of Hinglish tweets. The results show that the second approach performs best, with an F1-score of 70.52% on the held-out test data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsfastText