Knowledge Base Population using Semantic Label Propagation
Lucas Sterckx, Thomas Demeester, Johannes Deleu, Chris, Develder

TL;DR
This paper introduces Semantic Label Propagation, a method that enhances training data quality for relation extractors by combining distant supervision with semantic similarity, significantly improving knowledge base population with minimal manual effort.
Contribution
The paper presents Semantic Label Propagation, a novel technique that extends noisy training sets using semantic similarity, boosting relation extraction performance with minimal manual labeling.
Findings
Semantic Label Propagation improves precision and recall in relation extraction.
The method achieves substantial performance gains over existing approaches.
Manual annotation effort is nearly eliminated with the proposed strategy.
Abstract
A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques
