Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems
Italo Luis da Silva, Hanqi Yan, Lin Gui, Yulan He

TL;DR
This paper introduces a reinforcement learning approach using evaluation models trained to approximate human judgment, improving causal event extraction robustness and reducing reliance on extensive annotations.
Contribution
It proposes a weak reward model that transforms generative models into robust causal event extraction systems with less annotated data.
Findings
High agreement between evaluation models and human judgment
Effective transfer of evaluators across datasets
Strong supervision with limited annotated data yields high performance
Abstract
The inherent ambiguity of cause and effect boundaries poses a challenge in evaluating causal event extraction tasks. Traditional metrics like Exact Match and BertScore poorly reflect model performance, so we trained evaluation models to approximate human evaluation, achieving high agreement. We used them to perform Reinforcement Learning with extraction models to align them with human preference, prioritising semantic understanding. We successfully explored our approach through multiple datasets, including transferring an evaluator trained on one dataset to another as a way to decrease the reliance on human-annotated data. In that vein, we also propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model while still achieving high performance in training an RL model. Our code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Modeling and Causal Inference · Fault Detection and Control Systems
MethodsALIGN
