Long-Range Transformer Architectures for Document Understanding

Thibault Douzon; Stefan Duffner; Christophe Garcia; J\'er\'emy; Espinas

arXiv:2309.05503·cs.CL·September 12, 2023

Long-Range Transformer Architectures for Document Understanding

Thibault Douzon, Stefan Duffner, Christophe Garcia, J\'er\'emy, Espinas

PDF

1 Repo

TL;DR

This paper introduces efficient long-range Transformer models for Document Understanding, capable of processing entire multi-page documents, with improved accuracy and relevance guidance through 2D relative attention bias.

Contribution

It presents two novel multi-modal long-range Transformer architectures for DU, incorporating 2D relative attention bias to enhance focus on relevant tokens while maintaining efficiency.

Findings

01

Long-range models outperform traditional models on multi-page documents.

02

2D relative attention bias improves token relevance detection.

03

Models maintain efficiency with only small performance trade-offs.

Abstract

Since their release, Transformers have revolutionized many fields from Natural Language Understanding to Computer Vision. Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019. However, the computational complexity of the self-attention operation limits their capabilities to small sequences. In this paper we explore multiple strategies to apply Transformer based models to long multi-page documents. We introduce 2 new multi-modal (text + layout) long-range models for DU. They are based on efficient implementations of Transformers for long sequences. Long-range models can process whole documents at once effectively and are less impaired by the document's length. We compare them to LayoutLM, a classical Transformer adapted for DU and pre-trained on millions of documents. We further propose 2D relative attention bias to guide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thibaultdouzon/long-range-document-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings