SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble
Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei, Han

TL;DR
SetExpan is a novel framework for corpus-based set expansion that improves entity discovery by selecting clean context features and using an ensemble ranking method, outperforming previous approaches.
Contribution
It introduces a new context feature selection and rank ensemble approach to enhance set expansion accuracy in noisy corpora.
Findings
SetExpan outperforms state-of-the-art methods in mean average precision.
The framework effectively reduces noise from context features.
Experiments on three datasets demonstrate robustness and improved accuracy.
Abstract
Corpus-based set expansion (i.e., finding the "complete" set of entities belonging to the same semantic class, based on a given corpus and a tiny set of seeds) is a critical task in knowledge discovery. It may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search. To discover new entities in an expanded set, previous approaches either make one-time entity ranking based on distributional similarity, or resort to iterative pattern-based bootstrapping. The core challenge for these methods is how to deal with noisy context features derived from free-text corpora, which may lead to entity intrusion and semantic drifting. In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsFeature Selection
