Making Language Model a Hierarchical Classifier

Yihong Wang; Zhonglin Jiang; Ningyuan Xi; Yue Zhao; Qingqing Gu; Xiyuan Chen; Hao Wu; Sheng Xu; Hange Zhou; Yong Chen; Luo Ji

arXiv:2507.12930·cs.CL·September 30, 2025

Making Language Model a Hierarchical Classifier

Yihong Wang, Zhonglin Jiang, Ningyuan Xi, Yue Zhao, Qingqing Gu, Xiyuan Chen, Hao Wu, Sheng Xu, Hange Zhou, Yong Chen, Luo Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hierarchical decoder architecture for language models, enabling simultaneous multi-layer decoding to improve performance on hierarchical tasks and reduce computational costs.

Contribution

It proposes a novel hierarchical decoder built by adapting pretrained models, with layers decoding simultaneously, achieving state-of-the-art results on multiple hierarchical NLP tasks.

Findings

01

Outperforms baselines on WoS, DBpedia, ESconv, and EmpatheticDialogues

02

Validates meaningful content generation at intermediate layers

03

Provides theoretical analysis of convergence and efficiency

Abstract

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts simultaneously. Due to limited time and computationally resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coswindywang/HdLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsDropout · LLaMA · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Cosine Annealing · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding · Layer Normalization · Dense Connections