TL;DR
This paper identifies that current relation extraction models rely on shallow heuristics due to data artifacts, and introduces a challenge dataset to evaluate and improve their generalization capabilities.
Contribution
The authors create a challenge dataset (CRE) to expose shallow heuristics in SOTA RE models and demonstrate that alternative QA models perform better on this data.
Findings
SOTA RE models rely on shallow heuristics that fail on CRE.
QA-based models outperform SOTA RE models on the challenge set.
Adding challenge data to training improves model robustness.
Abstract
The process of collecting and annotating training data may introduce distribution artifacts which may limit the ability of models to learn correct generalization behavior. We identify failure modes of SOTA relation extraction (RE) models trained on TACRED, which we attribute to limitations in the data annotation process. We collect and annotate a challenge-set we call Challenging RE (CRE), based on naturally occurring corpus examples, to benchmark this behavior. Our experiments with four state-of-the-art RE models show that they have indeed adopted shallow heuristics that do not generalize to the challenge-set data. Further, we find that alternative question answering modeling performs significantly better than the SOTA models on the challenge-set, despite worse overall TACRED performance. By adding some of the challenge data as training examples, the performance of the model improves.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
