TaxoAlign: Scholarly Taxonomy Generation Using Language Models
Avishek Lahiri, Yufang Hou, Debarshi Kumar Sanyal

TL;DR
TaxoAlign is a novel method that automatically generates scholarly taxonomies, aligning closely with human-created structures, and is validated through a new benchmark and comprehensive evaluation framework.
Contribution
We introduce TaxoAlign, a three-phase instruction-guided approach for scholarly taxonomy creation, along with the CS-TaxoBench benchmark and an automated evaluation framework.
Findings
TaxoAlign outperforms baseline methods on most metrics.
The benchmark includes 460 human-written and 80 curated taxonomies.
Automated and human evaluations confirm TaxoAlign's effectiveness.
Abstract
Taxonomies play a crucial role in helping researchers structure and navigate knowledge in a hierarchical manner. They also form an important part in the creation of comprehensive literature surveys. The existing approaches to automatic survey generation do not compare the structure of the generated surveys with those written by human experts. To address this gap, we present our own method for automated taxonomy creation that can bridge the gap between human-generated and automatically-created taxonomies. For this purpose, we create the CS-TaxoBench benchmark which consists of 460 taxonomies that have been extracted from human-written survey papers. We also include an additional test set of 80 taxonomies curated from conference survey papers. We propose TaxoAlign, a three-phase topic-based instruction-guided method for scholarly taxonomy generation. Additionally, we propose a stringent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Information Retrieval and Search Behavior
