Differential Attention for Multimodal Crisis Event Analysis

Nusrat Munia; Junfeng Zhu; Olfa Nasraoui; Abdullah-Al-Zubaer Imran

arXiv:2507.05165·cs.CV·July 8, 2025

Differential Attention for Multimodal Crisis Event Analysis

Nusrat Munia, Junfeng Zhu, Olfa Nasraoui, Abdullah-Al-Zubaer Imran

PDF

1 Repo

TL;DR

This paper introduces a novel multimodal fusion approach using Differential Attention and Guided Cross Attention with vision-language models to improve crisis event classification from noisy social media data.

Contribution

It proposes combining CLIP embeddings, LLaVA-generated text, and adaptive fusion strategies to enhance multimodal crisis data analysis without task-specific fine-tuning.

Findings

01

Outperforms traditional models on CrisisMMD dataset

02

Differential Attention improves classification accuracy

03

Guided Cross Attention effectively aligns multimodal features

Abstract

Social networks can be a valuable source of information during crisis events. In particular, users can post a stream of multimodal data that can be critical for real-time humanitarian response. However, effectively extracting meaningful information from this large and noisy data stream and effectively integrating heterogeneous data remains a formidable challenge. In this work, we explore vision language models (VLMs) and advanced fusion strategies to enhance the classification of crisis data in three different tasks. We incorporate LLaVA-generated text to improve text-image alignment. Additionally, we leverage Contrastive Language-Image Pretraining (CLIP)-based vision and text embeddings, which, without task-specific fine-tuning, outperform traditional models. To further refine multimodal fusion, we employ Guided Cross Attention (Guided CA) and combine it with the Differential Attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

munia03/multimodal_crisis_event
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training