MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection
Amir Pouran Ben Veyseh, Minh Van Nguyen, Franck Dernoncourt, and Thien, Huu Nguyen

TL;DR
MINION is a large, multilingual event detection dataset covering 8 languages, including 5 previously unsupported, enabling research on cross-lingual ED challenges and transferability.
Contribution
This paper introduces MINION, a large-scale, diverse multilingual dataset for event detection across 8 languages, addressing gaps in existing datasets and supporting cross-lingual ED research.
Findings
Existing ED models vary in performance across languages.
Transfer learning shows promise for multilingual ED.
Multilingual ED presents unique challenges and opportunities.
Abstract
Event Detection (ED) is the task of identifying and classifying trigger words of event mentions in text. Despite considerable research efforts in recent years for English text, the task of ED in other languages has been significantly less explored. Switching to non-English languages, important research questions for ED include how well existing ED models perform on different languages, how challenging ED is in other languages, and how well ED knowledge and annotation can be transferred across languages. To answer those questions, it is crucial to obtain multilingual ED datasets that provide consistent event annotation for multiple languages. There exist some multilingual ED datasets; however, they tend to cover a handful of languages and mainly focus on popular ones. Many languages are not covered in existing multilingual ED datasets. In addition, the current datasets are often small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
