TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction
Kuan-Hao Huang, I-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang,, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng, Heng Ji

TL;DR
This paper introduces TextEE, a standardized benchmark for event extraction that addresses evaluation issues, enabling fair comparison of methods and highlighting current challenges in the field.
Contribution
It presents the first comprehensive, standardized benchmark with data preprocessing, splits, and evaluation of recent methods and large language models for event extraction.
Findings
Existing evaluation methods have inconsistencies and biases.
Recent models struggle to achieve high performance on TextEE.
The benchmark reveals significant challenges in current event extraction approaches.
Abstract
Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches. To address these challenges, we present TextEE, a standardized, fair, and reproducible benchmark for event extraction. TextEE comprises standardized data preprocessing scripts and splits for 16 datasets spanning eight diverse domains and includes 14 recent methodologies, conducting a comprehensive benchmark reevaluation. We also evaluate five varied large language models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Time Series Analysis and Forecasting
