GLEN: General-Purpose Event Detection for Thousands of Types
Qiusi Zhan, Sha Li, Kathryn Conger, Martha Palmer, Heng Ji, Jiawei Han

TL;DR
This paper introduces GLEN, a large-scale, comprehensive event detection dataset with over 205,000 mentions across 3,465 types, and a novel multi-stage detection model CEDAR that outperforms baselines.
Contribution
The paper presents a new extensive dataset for event detection and a specialized model CEDAR designed to handle large ontologies, advancing the field's capabilities.
Findings
CEDAR outperforms baseline models including InstructGPT.
Label noise remains a significant challenge.
GLEN covers over 3,465 event types, vastly larger than existing datasets.
Abstract
The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today's largest event dataset. GLEN is created by utilizing the DWD Overlay, which provides a mapping between Wikidata Qnodes and PropBank rolesets. This enables us to use the abundant existing annotation for PropBank as distant supervision. In addition, we also propose a new multi-stage event detection model CEDAR specifically designed to handle the large ontology size in GLEN. We show that our model exhibits superior performance compared to a range of baselines including InstructGPT. Finally, we perform error analysis and show that label noise is still the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Semantic Web and Ontologies
MethodsOntology
