MAVEN: A Massive General Domain Event Detection Dataset
Xiaozhi Wang, Ziqi Wang, Xu Han, Wangyi Jiang, Rong Han, Zhiyuan Liu,, Juanzi Li, Peng Li, Yankai Lin, Jie Zhou

TL;DR
MAVEN is a large-scale, diverse dataset for event detection in general domain texts, designed to address data scarcity and coverage issues in existing datasets, and to facilitate the development of more robust ED models.
Contribution
The paper introduces MAVEN, a massive dataset with extensive event types and instances, enabling improved training and benchmarking of event detection models.
Findings
Existing ED models perform poorly on MAVEN compared to small datasets.
ED remains a challenging task in real-world scenarios.
Further research is needed to improve ED methods for general domain applications.
Abstract
Event detection (ED), which means identifying event trigger words and classifying event types, is the first and most fundamental step for extracting event knowledge from plain text. Most existing datasets exhibit the following issues that limit further development of ED: (1) Data scarcity. Existing small-scale datasets are not sufficient for training and stably benchmarking increasingly sophisticated modern neural methods. (2) Low coverage. Limited event types of existing datasets cannot well cover general-domain events, which restricts the applications of ED models. To alleviate these problems, we present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types. MAVEN alleviates the data scarcity problem and covers much more general event types. We reproduce the recent state-of-the-art ED models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
