Who Should Go First? A Self-Supervised Concept Sorting Model for Improving Taxonomy Expansion
Xiangchen Song, Jiaming Shen, Jieyu Zhang, and Jiawei Han

TL;DR
This paper introduces TaxoOrder, a self-supervised framework that improves taxonomy expansion by discovering local hypernym-hyponym structures and determining the optimal insertion order, thereby enhancing the quality of expanded taxonomies.
Contribution
The paper presents a novel self-supervised method that models dependencies among new concepts and optimizes their insertion order in taxonomy expansion tasks.
Findings
TaxoOrder improves taxonomy quality across various metrics.
It effectively discovers local hypernym-hyponym structures.
Enhances existing taxonomy expansion systems with minimal integration effort.
Abstract
Taxonomies have been widely used in various machine learning and text mining systems to organize knowledge and facilitate downstream tasks. One critical challenge is that, as data and business scope grow in real applications, existing taxonomies need to be expanded to incorporate new concepts. Previous works on taxonomy expansion process the new concepts independently and simultaneously, ignoring the potential relationships among them and the appropriate order of inserting operations. However, in reality, the new concepts tend to be mutually correlated and form local hypernym-hyponym structures. In such a scenario, ignoring the dependencies of new concepts and the order of insertion may trigger error propagation. For example, existing taxonomy expansion systems may insert hyponyms to existing taxonomies before their hypernym, leading to sub-optimal expanded taxonomies. To complement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
