JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed data using Grid Search Cross Validation
Avishek Garain, Sainik Kumar Mahata, Dipankar Das

TL;DR
This paper presents a machine learning approach using feature extraction, SVR, and grid search for sentiment analysis of English-Hindi code-mixed data, achieving an F1-score of 66.2% in SemEval-2020.
Contribution
It introduces a novel application of grid search cross-validation with traditional ML algorithms for code-mixed sentiment analysis in Hindi-English data.
Findings
Achieved an F1-score of 66.2% on the SemEval-2020 dataset.
Demonstrated effectiveness of feature extraction with SVR and grid search in code-mixed sentiment analysis.
Provided a baseline approach for future research in multilingual code-mixed sentiment tasks.
Abstract
Code-mixing is a phenomenon which arises mainly in multilingual societies. Multilingual people, who are well versed in their native languages and also English speakers, tend to code-mix using English-based phonetic typing and the insertion of anglicisms in their main language. This linguistic phenomenon poses a great challenge to conventional NLP domains such as Sentiment Analysis, Machine Translation, and Text Summarization, to name a few. In this work, we focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis. This work was done as participation in the SemEval-2020 Sentimix Task, where we focused on the sentiment analysis of English-Hindi code-mixed sentences. our username for the submission was "sainik.mahata" and team name was "JUNLP". We used feature extraction algorithms in conjunction with traditional machine learning algorithms such as SVR and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
