DocRED: A Large-Scale Document-Level Relation Extraction Dataset
Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin, Zhenghao Liu,, Zhiyuan Liu, Lixin Huang, Jie Zhou, Maosong Sun

TL;DR
DocRED is a large, challenging dataset for document-level relation extraction that combines human-annotated and distantly supervised data, highlighting the complexity of synthesizing information across multiple sentences.
Contribution
The paper introduces DocRED, the largest human-annotated dataset for document-level RE, with both supervised and weakly supervised data, facilitating research in this complex area.
Findings
Existing RE methods perform poorly on DocRED, indicating its difficulty.
DocRED enables evaluation of document-level RE methods and highlights open challenges.
Analysis suggests future research directions for improving RE models.
Abstract
Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. In order to verify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
