Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED
Quzhe Huang, Shibo Hao, Yuan Ye, Shengqi Zhu, Yansong Feng, Dongyan, Zhao

TL;DR
This paper critically examines the recommend-revise annotation scheme used in DocRED, revealing its biases and limitations, and provides a relabeled dataset to improve the reliability of document relation extraction evaluations.
Contribution
It identifies biases and false negatives caused by the recommend-revise scheme and releases a relabeled dataset to enhance evaluation reliability.
Findings
Recommend-revise scheme introduces false negatives and biases.
Models trained on DocRED have low recall on relabeled data.
Annotator behavior is influenced by the scheme, affecting data quality.
Abstract
DocRED is a widely used dataset for document-level relation extraction. In the large-scale annotation, a \textit{recommend-revise} scheme is adopted to reduce the workload. Within this scheme, annotators are provided with candidate relation instances from distant supervision, and they then manually supplement and remove relational facts based on the recommendations. However, when comparing DocRED with a subset relabeled from scratch, we find that this scheme results in a considerable amount of false negative samples and an obvious bias towards popular entities and relations. Furthermore, we observe that the models trained on DocRED have low recall on our relabeled dataset and inherit the same bias in the training data. Through the analysis of annotators' behaviors, we figure out the underlying reason for the problems above: the scheme actually discourages annotators from supplementing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
