Duluth UROP at SemEval-2018 Task 2: Multilingual Emoji Prediction with   Ensemble Learning and Oversampling

Shuning Jin; Ted Pedersen

arXiv:1805.10267·cs.CL·May 28, 2018·1 cites

Duluth UROP at SemEval-2018 Task 2: Multilingual Emoji Prediction with Ensemble Learning and Oversampling

Shuning Jin, Ted Pedersen

PDF

Open Access 1 Repo

TL;DR

This paper presents a multilingual emoji prediction system using ensemble classifiers with oversampling to handle data skewness, achieving competitive results in SemEval-2018 Task 2.

Contribution

The authors developed an ensemble learning approach with oversampling for multilingual emoji prediction, demonstrating significant performance improvements after simple preprocessing adjustments.

Findings

01

Ensemble classifiers improved prediction accuracy.

02

Oversampling helped address data imbalance.

03

Preprocessing changes significantly boosted results.

Abstract

This paper describes the Duluth UROP systems that participated in SemEval--2018 Task 2, Multilingual Emoji Prediction. We relied on a variety of ensembles made up of classifiers using Naive Bayes, Logistic Regression, and Random Forests. We used unigram and bigram features and tried to offset the skewness of the data through the use of oversampling. Our task evaluation results place us 19th of 48 systems in the English evaluation, and 5th of 21 in the Spanish. After the evaluation we realized that some simple changes to preprocessing could significantly improve our results. After making these changes we attained results that would have placed us sixth in the English evaluation, and second in the Spanish.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuningjin/SemEval2018-Task2-EmojiDetection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Digital Communication and Language · Natural Language Processing Techniques

MethodsLogistic Regression