LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text
Pranaydeep Singh, Els Lefever

TL;DR
This paper explores cross-lingual embedding techniques for sentiment analysis of Hinglish social media text, comparing embedding projection and incremental retraining methods, with the latter achieving the best performance.
Contribution
It introduces and evaluates two novel approaches for Hinglish sentiment analysis using cross-lingual embeddings, demonstrating the effectiveness of incremental retraining.
Findings
Incremental retraining of embeddings yields higher F1-score.
Cross-lingual embedding projection is less effective.
Achieved 70.52% F1-score on test data.
Abstract
This paper describes our contribution to the SemEval-2020 Task 9 on Sentiment Analysis for Code-mixed Social Media Text. We investigated two approaches to solve the task of Hinglish sentiment analysis. The first approach uses cross-lingual embeddings resulting from projecting Hinglish and pre-trained English FastText word embeddings in the same space. The second approach incorporates pre-trained English embeddings that are incrementally retrained with a set of Hinglish tweets. The results show that the second approach performs best, with an F1-score of 70.52% on the held-out test data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsfastText
