Language Models as Hierarchy Encoders

Yuan He; Zhangdie Yuan; Jiaoyan Chen; Ian Horrocks

arXiv:2401.11374·cs.CL·November 22, 2024·2 cites

Language Models as Hierarchy Encoders

Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks

PDF

Open Access 1 Repo 6 Models 5 Datasets 1 Video

TL;DR

This paper introduces Hierarchy Transformer encoders (HiTs), which explicitly encode hierarchical structures in language using hyperbolic space, improving tasks like inference and subsumption prediction over standard models.

Contribution

The paper presents a novel re-training method for transformer-based language models as hierarchy encoders using hyperbolic space, enabling explicit hierarchical structure encoding.

Findings

01

HiTs outperform baselines in transitive inference tasks

02

HiTs better predict subsumption relations

03

HiTs demonstrate improved transferability across hierarchies

Abstract

Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincar\'e ball with a curvature that adapts to the embedding dimension, followed by training on hyperbolic clustering and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

krr-oxford/hierarchytransformers
pytorchOfficial

Models

Datasets

Videos

Language Models as Hierarchy Encoders· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Dropout · Linear Layer · Byte Pair Encoding · Adam