Semi-automatic Generation of Multilingual Datasets for Stance Detection   in Twitter

Elena Zotova; Rodrigo Agerri; German Rigau

arXiv:2101.11978·cs.CL·January 29, 2021

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Elena Zotova, Rodrigo Agerri, German Rigau

PDF

TL;DR

This paper introduces a semi-automatic method to generate large, balanced, and multilingual stance detection datasets for Twitter by leveraging user-based information, addressing the scarcity of resources in multiple languages.

Contribution

The paper presents a novel semi-automatic approach that reduces manual annotation effort for multilingual stance detection datasets in social media.

Findings

01

Method effectively creates large, balanced multilingual datasets

02

Empirical results demonstrate improved data quality for stance detection

03

Qualitative analysis confirms the method's adaptability to other NLP tasks

Abstract

Popular social media networks provide the perfect environment to study the opinions and attitudes expressed by users. While interactions in social media such as Twitter occur in many natural languages, research on stance detection (the position or attitude expressed with respect to a specific topic) within the Natural Language Processing field has largely been done for English. Although some efforts have recently been made to develop annotated data in other languages, there is a telling lack of resources to facilitate multilingual and crosslingual research on stance detection. This is partially due to the fact that manually annotating a corpus of social media texts is a difficult, slow and costly process. Furthermore, as stance is a highly domain- and topic-specific phenomenon, the need for annotated data is specially demanding. As a result, most of the manually labeled resources are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.