# Extracting Victim Counts from Text

**Authors:** Mian Zhong, Shehzaad Dhuliawala, Niklas Stoehr

arXiv: 2302.12367 · 2023-02-27

## TL;DR

This paper presents a novel approach to extracting victim counts from textual reports during crises by framing it as a question answering task, comparing various models, and analyzing their robustness and reliability.

## Contribution

It introduces a QA-based framework for victim count extraction, evaluates multiple models including large language models, and provides practical recommendations for deployment in humanitarian contexts.

## Key findings

- QA framing improves extraction accuracy
- Advanced models outperform regex and dependency parsing
- Models show varying robustness in out-of-distribution scenarios

## Abstract

Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.12367/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/2302.12367/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/2302.12367/full.md

---
Source: https://tomesphere.com/paper/2302.12367