Bridging Social Media via Distant Supervision
Walid Magdy, Hassan Sajjad, Tarek El-Ganainy, Fabrizio, Sebastiani

TL;DR
This paper presents a method for classifying tweets by transferring labels from YouTube videos through distant supervision, significantly reducing manual labeling effort and improving classification performance across languages and class counts.
Contribution
It introduces a novel distant supervision approach for tweet classification by leveraging YouTube video labels, enabling scalable and cost-effective training data generation.
Findings
Automatically labelled data improves classifier performance.
Method is robust across multiple languages.
Effective with varying numbers of classes.
Abstract
Microblog classification has received a lot of attention in recent years. Different classification tasks have been investigated, most of them focusing on classifying microblogs into a small number of classes (five or less) using a training set of manually annotated tweets. Unfortunately, labelling data is tedious and expensive, and finding tweets that cover all the classes of interest is not always straightforward, especially when some of the classes do not frequently arise in practice. In this paper we study an approach to tweet classification based on distant supervision, whereby we automatically transfer labels from one social medium to another for a single-label multi-class classification task. In particular, we apply YouTube video classes to tweets linking to these videos. This provides for free a virtually unlimited number of labelled instances that can be used as training data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
