LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

Jacob Ativo; Bharaneeshwar Balasubramaniyam; Anh Tran; Khushboo Gupta; Hongmin Li; Doina Caragea; Cornelia Caragea

arXiv:2605.08448·cs.AI·May 12, 2026

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

Jacob Ativo, Bharaneeshwar Balasubramaniyam, Anh Tran, Khushboo Gupta, Hongmin Li, Doina Caragea, Cornelia Caragea

PDF

1 Repo

TL;DR

This paper evaluates LLM-guided semi-supervised methods for classifying crisis-related social media data, demonstrating their effectiveness in low-resource scenarios and highlighting the potential for deploying smaller models in disaster response.

Contribution

It introduces the first empirical comparison of LLM-guided semi-supervised approaches like VerifyMatch and LG-CoTrain for crisis tweet classification, showing their advantages over classical methods.

Findings

01

LG-CoTrain outperforms classical semi-supervised methods in low-resource settings.

02

VerifyMatch shows strong calibration and competitive performance.

03

Smaller semi-supervised models can outperform large LLMs in zero-shot scenarios.

Abstract

Semi-supervised learning approaches have been investigated as a means to enhance the analysis of social media data in disaster management contexts. In this work, we present the first empirical evaluation of large language model (LLM) guided semi-supervised learning for crisis related tweet classification. We compare two recent LLM assisted semi-supervised methods, VerifyMatch and LLM guided Co-Training ( LG-CoTrain), against established semi-supervised baselines. Our results show that LG-CoTrain significantly outperforms classical semi-supervised approaches in low resource settings with 5, 10 and 25 labeled examples per class, achieving the highest averaged Macro F1 across events. VerifyMatch achieves competitive performance while also demonstrating strong calibration properties. As the number of labeled examples increases, the performance gap narrows and Self Training emerges as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.