HDT: Hierarchical Document Transformer

Haoyu He; Markus Flicke; Jan Buchmann; Iryna Gurevych; Andreas Geiger

arXiv:2407.08330·cs.LG·July 12, 2024

HDT: Hierarchical Document Transformer

Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger

PDF

Open Access 2 Models

TL;DR

The paper introduces HDT, a hierarchical sparse Transformer that efficiently leverages document structure for improved performance and efficiency in processing structured documents across various domains.

Contribution

HDT is a novel hierarchical sparse Transformer architecture that explicitly exploits document structure through auxiliary tokens and a multi-level attention hierarchy.

Findings

01

Faster convergence on downstream tasks

02

Higher sample efficiency compared to existing models

03

Improved performance on structured document benchmarks

Abstract

In this paper, we propose the Hierarchical Document Transformer (HDT), a novel sparse Transformer architecture tailored for structured hierarchical documents. Such documents are extremely important in numerous domains, including science, law or medicine. However, most existing solutions are inefficient and fail to make use of the structure inherent to documents. HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. This approach facilitates information exchange between tokens at different levels while maintaining sparsity, thereby enhancing computational and memory efficiency while exploiting the document structure as an inductive bias. We address the technical challenge of implementing HDT's sample-dependent hierarchical attention pattern by developing a novel sparse attention kernel that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications · Video Analysis and Summarization · Web Data Mining and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Multi-Head Attention · Softmax · Linear Warmup With Cosine Annealing · Residual Connection · Byte Pair Encoding