UG18 at SemEval-2018 Task 1: Generating Additional Training Data for   Predicting Emotion Intensity in Spanish

Marloes Kuijper; Mike van Lenthe; Rik van Noord

arXiv:1805.10824·cs.CL·May 29, 2018

UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish

Marloes Kuijper, Mike van Lenthe, Rik van Noord

PDF

TL;DR

This study shows that automatically generating additional training data through translation and semi-supervised learning improves emotion intensity prediction in Spanish tweets, achieving top ranks in SemEval-2018.

Contribution

The paper introduces a method of augmenting training data for emotion prediction in Spanish by translating data from other languages and using semi-supervised learning, demonstrating performance gains.

Findings

01

Models with augmented data outperform regular models in all subtasks.

02

Translation and semi-supervised methods significantly improve prediction accuracy.

03

Ensembling different models did not further enhance performance.

Abstract

The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.