Short text classification with machine learning in the social sciences:   The case of climate change on Twitter

Karina Shyrokykh; Maksym Girnyk; Lisa Dellmuth

arXiv:2310.04452·cs.CL·October 10, 2023

Short text classification with machine learning in the social sciences: The case of climate change on Twitter

Karina Shyrokykh, Maksym Girnyk, Lisa Dellmuth

PDF

1 Repo

TL;DR

This study evaluates machine learning classifiers for short text classification in social sciences, specifically analyzing Twitter data on climate change, highlighting the effectiveness of traditional methods over deep learning in resource-constrained scenarios.

Contribution

It compares the performance of various machine learning classifiers on small, imbalanced datasets in social science research, demonstrating the efficiency of traditional methods.

Findings

01

Supervised machine learning outperforms lexicons, especially with higher class balance.

02

Traditional methods like logistic regression and random forest perform comparably to deep learning.

03

Traditional methods require less training time and computational resources.

Abstract

To analyse large numbers of texts, social science researchers are increasingly confronting the challenge of text classification. When manual labeling is not possible and researchers have to find automatized ways to classify texts, computer science provides a useful toolbox of machine-learning methods whose performance remains understudied in the social sciences. In this article, we compare the performance of the most widely used text classifiers by applying them to a typical research scenario in social science research: a relatively small labeled dataset with infrequent occurrence of categories of interest, which is a part of a large unlabeled dataset. As an example case, we look at Twitter communication regarding climate change, a topic of increasing scholarly interest in interdisciplinary social science research. Using a novel dataset including 5,750 tweets from various international…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shikarina/short_text_classification
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLogistic Regression