EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Praval Sharma; Ashok Samal; Leen-Kiat Soh; and Deepti Joshi

arXiv:2604.21890·cs.CL·April 24, 2026

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

Praval Sharma, Ashok Samal, Leen-Kiat Soh, and Deepti Joshi

PDF

TL;DR

This paper introduces EVENT5Ws, a large open-domain event extraction dataset, and evaluates how well current models perform on it, highlighting its potential for advancing generalizable event extraction algorithms.

Contribution

The creation of a large, manually annotated open-domain event extraction dataset and a benchmark for evaluating and improving event extraction models.

Findings

01

Models trained on EVENT5Ws generalize across different geographical datasets.

02

The dataset provides empirical insights into annotation complexity.

03

Benchmark results show current models' capabilities and limitations.

Abstract

Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore, it is necessary to develop automated event extraction approaches. However, existing datasets for algorithm development have limitations, including limited coverage of event types in closed-domain settings and a lack of large, manually verified dataset in open-domain settings. To address these limitations, we create EVENT5Ws , a large, manually annotated, and statistically verified open-domain event extraction dataset. We design a systematic annotation pipeline to create the dataset and provide empirical insights into annotation complexity. Using EVENT5Ws, we evaluate state-of-the-art pre-trained large language models and establish a benchmark for future research. We further show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.