Understanding Long Documents with Different Position-Aware Attentions
Hai Pham, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang

TL;DR
This paper introduces novel position-aware attention mechanisms for long document understanding, enabling transformers to efficiently process extended multimodal inputs with improved performance.
Contribution
It proposes 1D and 2D position-aware attention methods that handle long documents with shortened context, adaptable to existing transformer architectures.
Findings
Proposed models outperform baselines on various metrics.
Models efficiently process long multimodal documents.
Attention modifications are easily integrated into existing transformers.
Abstract
Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input. Most current transformer-based approaches only deal with short documents and employ solely textual information for attention due to its prohibitive computation and memory limit. To address those issues in long document understanding, we explore different approaches in handling 1D and new 2D position-aware attention with essentially shortened context. Experimental results show that our proposed models have advantages for this task based on various evaluation metrics. Furthermore, our model makes changes only to the attention and thus can be easily adapted to any transformer-based architecture.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Natural Language Processing Techniques
