Predicting Document Coverage for Relation Extraction
Sneha Singhania, Simon Razniewski, Gerhard Weikum

TL;DR
This paper introduces the task of predicting document coverage for relation extraction, presenting a dataset and models to estimate how well a document covers relational information for entities, aiding knowledge base construction.
Contribution
It defines a new coverage prediction task, provides a large dataset, and develops models combining features with BERT to estimate document relevance for relation extraction.
Findings
Features alone have moderate predictive power.
The HERB model combining features and BERT achieves up to 46% F1 score.
Coverage prediction improves document selection for KB construction and claim refutation.
Abstract
This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze the correlation of document coverage with features like length, entity mention frequency, Alexa rank, language complexity and information retrieval scores. Each of these features has only moderate predictive power. We employ methods combining features with statistical models like TF-IDF and language models like BERT. The model combining features and BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of coverage predictions on two use cases: KB construction and claim…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Balanced Selection · Dropout · Attention Dropout · Dense Connections · Weight Decay · Linear Warmup With Linear Decay
