TL;DR
This paper introduces a quick, user-friendly method for bootstrapping relation extraction datasets using syntactic search engines, enabling non-experts to create effective models without extensive manual annotation.
Contribution
The paper presents a novel approach combining syntactic search with NLG data augmentation to efficiently generate training data for relation extractors, reducing reliance on manual labeling.
Findings
Models trained with syntactic search data are competitive with manually annotated models.
The combined approach outperforms NLG data augmentation alone.
The method enables non-experts to quickly create effective relation extraction datasets.
Abstract
The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
