REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction

Omar Sharif; Joseph Gatto; Madhusudan Basak; Sarah M. Preum

arXiv:2502.16838·cs.CL·September 11, 2025

REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction

Omar Sharif, Joseph Gatto, Madhusudan Basak, Sarah M. Preum

PDF

Open Access 1 Video

TL;DR

REGen is a new evaluation framework for generative event argument extraction that combines multiple matching strategies to better reflect true model performance, especially for large language models, and aligns well with human judgment.

Contribution

It introduces REGen, an evaluation method that improves upon exact match by incorporating relaxed and LLM-based matching, capturing more accurate performance of generative models.

Findings

01

REGen shows an average +23.93 F1 performance gain over EM.

02

REGen achieves 87.67% alignment with human judgment.

03

Experiments on six datasets demonstrate REGen's effectiveness.

Abstract

Event argument extraction identifies arguments for predefined event roles in text. Existing work evaluates this task with exact match (EM), where predicted arguments must align exactly with annotated spans. While suitable for span-based models, this approach falls short for large language models (LLMs), which often generate diverse yet semantically accurate arguments. EM severely underestimates performance by disregarding valid variations. Furthermore, EM evaluation fails to capture implicit arguments (unstated but inferable) and scattered arguments (distributed across a document). These limitations underscore the need for an evaluation framework that better captures models' actual performance. To bridge this gap, we introduce REGen, a Reliable Evaluation framework for Generative event argument extraction. REGen combines the strengths of exact, relaxed, and LLM-based matching to better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction· underline

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques