SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery
Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni and, Xiang Ren, Jiawei Han

TL;DR
SynSetExpan introduces an iterative framework that jointly improves entity set expansion and synonym discovery by leveraging their interdependence, validated on a new large-scale dataset and benchmarks.
Contribution
The paper proposes a novel iterative framework that jointly enhances entity set expansion and synonym discovery, and introduces the first large-scale dataset for these tasks.
Findings
Effective in improving both entity set expansion and synonym discovery.
Outperforms previous methods on SE2 dataset and benchmarks.
Creates the first large-scale dataset for joint study of these tasks.
Abstract
Entity set expansion and synonym discovery are two critical NLP tasks. Previous studies accomplish them separately, without exploring their interdependencies. In this work, we hypothesize that these two tasks are tightly coupled because two synonymous entities tend to have similar likelihoods of belonging to various semantic classes. This motivates us to design SynSetExpan, a novel framework that enables two tasks to mutually enhance each other. SynSetExpan uses a synonym discovery model to include popular entities' infrequent synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expansion model, being able to determine whether an entity belongs to a semantic class, can generate pseudo training data to fine-tune the synonym discovery model towards better accuracy. To facilitate the research on studying the interplays of these two tasks, we create the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
