Evaluating Robustness of LLMs on Crisis-Related Microblogs across   Events, Information Types, and Linguistic Features

Muhammad Imran; Abdul Wahab Ziaullah; Kai Chen; Ferda Ofli

arXiv:2412.10413·cs.CL·December 17, 2024

Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features

Muhammad Imran, Abdul Wahab Ziaullah, Kai Chen, Ferda Ofli

PDF

Open Access

TL;DR

This study evaluates the robustness of six large language models in processing disaster-related microblog data, revealing their strengths and weaknesses across different events, information types, and linguistic features, with proprietary models generally outperforming open-source ones.

Contribution

It provides a comprehensive benchmarking of LLMs on crisis-related social media data, analyzing their generalizability, limitations, and the impact of linguistic features across multiple real-world events.

Findings

01

LLMs like GPT-4 outperform others in generalizability.

02

Most models struggle with flood-related data and critical information extraction.

03

Proprietary models outperform open-source models in all tasks.

Abstract

The widespread use of microblogging platforms like X (formerly Twitter) during disasters provides real-time information to governments and response authorities. However, the data from these platforms is often noisy, requiring automated methods to filter relevant information. Traditionally, supervised machine learning models have been used, but they lack generalizability. In contrast, Large Language Models (LLMs) show better capabilities in understanding and processing natural language out of the box. This paper provides a detailed analysis of the performance of six well-known LLMs in processing disaster-related social media data from a large-set of real-world events. Our findings indicate that while LLMs, particularly GPT-4o and GPT-4, offer better generalizability across different disasters and information types, most LLMs face challenges in processing flood-related data, show minimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Public Relations and Crisis Communication · Service-Oriented Architecture and Web Services

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax