Cross-Modal Rationale Transfer for Explainable Humanitarian Classification on Social Media
Thi Huyen Nguyen, Koustav Rudra, Wolfgang Nejdl

TL;DR
This paper introduces a multimodal classification framework that transfers rationales between text and images to improve explainability and accuracy in humanitarian crisis detection on social media.
Contribution
It proposes a cross-modal rationale transfer method using visual language transformers, enhancing interpretability and reducing annotation effort in multimodal crisis classification.
Findings
Boosts Macro-F1 by 2-35% on CrisisMMD dataset
Retrieves better image rationales with 12% improvement
Achieves 80% accuracy in zero-shot mode on unseen data
Abstract
Advances in social media data dissemination enable the provision of real-time information during a crisis. The information comes from different classes, such as infrastructure damages, persons missing or stranded in the affected zone, etc. Existing methods attempted to classify text and images into various humanitarian categories, but their decision-making process remains largely opaque, which affects their deployment in real-life applications. Recent work has sought to improve transparency by extracting textual rationales from tweets to explain predicted classes. However, such explainable classification methods have mostly focused on text, rather than crisis-related images. In this paper, we propose an interpretable-by-design multimodal classification framework. Our method first learns the joint representation of text and image using a visual language transformer model and extracts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining
