Hierarchical Text Classification with LLM-Refined Taxonomies

Jonas Golde; Nicolaas Jedema; Ravi Krishnan; Phong Le

arXiv:2601.18375·cs.CL·January 27, 2026

Hierarchical Text Classification with LLM-Refined Taxonomies

Jonas Golde, Nicolaas Jedema, Ravi Krishnan, Phong Le

PDF

Open Access 1 Video

TL;DR

This paper introduces TaxMorph, a framework that uses large language models to refine hierarchical taxonomies, leading to improved text classification performance by better aligning the taxonomy structure with model semantics.

Contribution

TaxMorph is the first method to revise entire taxonomies using LLMs, enhancing HTC by improving taxonomy-model alignment and outperforming human-curated hierarchies.

Findings

01

LLM-refined taxonomies outperform human-curated ones in F1 score by up to +2.9pp.

02

Refined taxonomies align more closely with model confusion patterns.

03

Human-curated taxonomies produce more separable clusters in embedding space.

Abstract

Hierarchical text classification (HTC) depends on taxonomies that organize labels into structured hierarchies. However, many real-world taxonomies introduce ambiguities, such as identical leaf names under similar parent nodes, which prevent language models (LMs) from learning clear decision boundaries. In this paper, we present TaxMorph, a framework that uses large language models (LLMs) to transform entire taxonomies through operations such as renaming, merging, splitting, and reordering. Unlike prior work, our method revises the full hierarchy to better match the semantics encoded by LMs. Experiments across three HTC benchmarks show that LLM-refined taxonomies consistently outperform human-curated ones in various settings up to +2.9pp. in F1. To better understand these improvements, we compare how well LMs can assign leaf nodes to parent nodes and vice versa across human-curated and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hierarchical Text Classification with LLM-Refined Taxonomies· underline

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Machine Learning in Healthcare