HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response
Selim Fekih, Nicol\`o Tamagnone, Benjamin Minixhofer, Ranjan Shrestha,, Ximena Contla, Ewan Oglethorpe, Navid Rekabsaz

TL;DR
HumSet is a comprehensive multilingual dataset of humanitarian response documents annotated for information extraction and classification, facilitating the development of NLP tools for crisis response in English, French, and Spanish.
Contribution
We introduce HumSet, a new annotated multilingual dataset for humanitarian crisis analysis, including novel extraction and classification tasks with baseline experiments using pre-trained language models.
Findings
Baseline models achieve promising results on extraction tasks.
Multilingual data improves the robustness of humanitarian NLP systems.
HumSet enables future research in multilingual crisis response NLP.
Abstract
Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HUMSET provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HUMSET also provides novel and challenging entry extraction and multi-label entry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDisaster Management and Resilience · Topic Modeling · Viral Infections and Outbreaks Research
