Consistent Document-Level Relation Extraction via Counterfactuals
Ali Modarressi, Abdullatif K\"oksal, Hinrich Sch\"utze

TL;DR
This paper introduces CovEReD, a counterfactual data generation method for document-level relation extraction, which improves model consistency by reducing reliance on spurious signals without harming overall performance.
Contribution
The paper presents a novel counterfactual data augmentation approach for document-level RE that enhances model consistency and reduces bias from spurious correlations.
Findings
Models trained on factual data are inconsistent after entity replacement.
Counterfactual training maintains consistency with minimal performance loss.
Re-DocRED-CF dataset enables evaluation of model robustness against biases.
Abstract
Many datasets have been developed to train and evaluate document-level relation extraction (RE) models. Most of these are constructed using real-world data. It has been shown that RE models trained on real-world data suffer from factual biases. To evaluate and address this issue, we present CovEReD, a counterfactual data generation approach for document-level relation extraction datasets using entity replacement. We first demonstrate that models trained on factual data exhibit inconsistent behavior: while they accurately extract triples from factual data, they fail to extract the same triples after counterfactual modification. This inconsistency suggests that models trained on factual data rely on spurious signals such as specific entities and external knowledge rather than on the input context to extract triples. We show that by generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Digital Humanities and Scholarship · Web Data Mining and Analysis
