TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora
Priyanka Kargupta, Nan Zhang, Yunyi Zhang, Rui Zhang, Prasenjit Mitra, Jiawei Han

TL;DR
TaxoAdapt is a dynamic framework that adapts LLM-generated multidimensional taxonomies to evolving scientific corpora, improving granularity and coherence in organizing research literature across multiple dimensions.
Contribution
It introduces a novel iterative hierarchical classification method that dynamically adapts taxonomies to evolving scientific domains, addressing generalizability and multi-faceted literature representation.
Findings
Achieves 26.51% more granularity preservation than baselines.
Achieves 50.41% higher coherence compared to competitive methods.
Demonstrates effectiveness across diverse computer science conferences.
Abstract
The rapid evolution of scientific fields introduces challenges in organizing and retrieving scientific literature. While expert-curated taxonomies have traditionally addressed this need, the process is time-consuming and expensive. Furthermore, recent automatic taxonomy construction methods either (1) over-rely on a specific corpus, sacrificing generalizability, or (2) depend heavily on the general knowledge of large language models (LLMs) contained within their pre-training datasets, often overlooking the dynamic nature of evolving scientific domains. Additionally, these approaches fail to account for the multi-faceted nature of scientific literature, where a single research paper may contribute to multiple dimensions (e.g., methodology, new tasks, evaluation metrics, benchmarks). To address these gaps, we propose TaxoAdapt, a framework that dynamically adapts an LLM-generated taxonomy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Computational and Text Analysis Methods
MethodsSparse Evolutionary Training
