Multilingual Twitter Sentiment Classification: The Role of Human   Annotators

Igor Mozetic; Miha Grcar; Jasmina Smailovic

arXiv:1602.07563·cs.CL·August 31, 2021

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

Igor Mozetic, Miha Grcar, Jasmina Smailovic

PDF

2 Repos

TL;DR

This paper investigates the impact of training data quality, especially human annotation agreement, on multilingual Twitter sentiment classification, showing that larger, high-quality datasets lead to models approaching human agreement levels.

Contribution

It demonstrates that data quality and size are more critical than model type, and highlights the importance of monitoring annotator agreement for better sentiment classification.

Findings

01

Model performance aligns with inter-annotator agreement with sufficient data size.

02

Training data quality significantly influences classification accuracy.

03

Humans perceive sentiment classes as ordered.

Abstract

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.