A bag-of-concepts model improves relation extraction in a narrow knowledge domain with limited data
Jiyu Chen, Karin Verspoor, Zenan Zhai

TL;DR
This paper introduces a bag-of-concepts approach leveraging word embeddings and synonyms to improve relation extraction in a limited data, narrow domain setting, specifically for clinical breast cancer treatment texts.
Contribution
It presents a novel bag-of-concepts feature engineering method that enhances relation extraction performance with small datasets in specialized domains.
Findings
BoC approach outperforms traditional methods
WBC method effectively captures relevant context
Method is effective on small, domain-specific datasets
Abstract
This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer follow-up treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called window-bounded co-occurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intra-sentential co-occurrence baseline. We further introduce a new bag-of-concepts (BoC) approach to feature engineering based on the state-of-the-art word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
