Language Models as Hierarchy Encoders
Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks

TL;DR
This paper introduces Hierarchy Transformer encoders (HiTs), which explicitly encode hierarchical structures in language using hyperbolic space, improving tasks like inference and subsumption prediction over standard models.
Contribution
The paper presents a novel re-training method for transformer-based language models as hierarchy encoders using hyperbolic space, enabling explicit hierarchical structure encoding.
Findings
HiTs outperform baselines in transitive inference tasks
HiTs better predict subsumption relations
HiTs demonstrate improved transferability across hierarchies
Abstract
Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincar\'e ball with a curvature that adapts to the embedding dimension, followed by training on hyperbolic clustering and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNounmodel· 6.4k dl· ♡ 46.4k dl♡ 4
- 🤗Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNounmodel· 26 dl26 dl
- 🤗Hierarchy-Transformers/HiT-MPNet-WordNetNounmodel· 74 dl74 dl
- 🤗Hierarchy-Transformers/HiT-MiniLM-L12-SnomedCTmodel· 93 dl· ♡ 393 dl♡ 3
- 🤗WeihaoLi/full_icdmodel
- 🤗WeihaoLi/icd9model· 1 dl1 dl
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Dropout · Linear Layer · Byte Pair Encoding · Adam
