SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models
Shiqiang Cai, Nianhong Niu, Shizhu He, Kang Liu, Jun Zhao

TL;DR
SC-Taxo is a framework that uses large language models to generate hierarchical scientific taxonomies with improved semantic consistency and structural accuracy.
Contribution
It introduces a hierarchy-aware refinement process with bidirectional heading generation to address semantic misalignment in taxonomy creation.
Findings
Improves hierarchy alignment and heading quality on benchmark datasets.
Demonstrates robust cross-lingual generalization to Chinese scientific literature.
Abstract
Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge. A high-quality scientific taxonomy offers a structured and hierarchical representation of a research field, facilitating literature exploration and topic navigation, as well as enabling downstream applications such as trend analysis, idea generation, and information retrieval. However, existing taxonomy generation approaches often suffer from structural inconsistencies and semantic misalignment across hierarchical levels. Through empirical analysis, we find that these issues largely stem from inadequate modeling of hierarchical semantic consistency. To address this limitation, we propose a semantic-consistent taxonomy generation (SC-Taxo) framework that leverages large language models (LLMs) with hierarchy-aware refinement stages to ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
