UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish
Marloes Kuijper, Mike van Lenthe, Rik van Noord

TL;DR
This study shows that automatically generating additional training data through translation and semi-supervised learning improves emotion intensity prediction in Spanish tweets, achieving top ranks in SemEval-2018.
Contribution
The paper introduces a method of augmenting training data for emotion prediction in Spanish by translating data from other languages and using semi-supervised learning, demonstrating performance gains.
Findings
Models with augmented data outperform regular models in all subtasks.
Translation and semi-supervised methods significantly improve prediction accuracy.
Ensembling different models did not further enhance performance.
Abstract
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
