StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training
Kaustubh Ponkshe, Venkatapathy Subramanian, Natwar Modani, Ganesh, Ramakrishnan

TL;DR
This paper investigates how incorporating document structure into transformer-based language models through a new masking attention mechanism affects pre-training and downstream task performance, emphasizing the importance of structure-aware training.
Contribution
It introduces a structure-aware masking attention mechanism for transformers and empirically evaluates its impact on BERT pre-training and document understanding tasks.
Findings
Global attention influences attention patterns during pre-training
Structure-aware pre-training improves document understanding performance
Incorporating document structure enhances model abstraction capabilities
Abstract
Most state-of-the-art techniques for Language Models (LMs) today rely on transformer-based architectures and their ubiquitous attention mechanism. However, the exponential growth in computational requirements with longer input sequences confines Transformers to handling short passages. Recent efforts have aimed to address this limitation by introducing selective attention mechanisms, notably local and global attention. While sparse attention mechanisms, akin to full attention in being Turing-complete, have been theoretically established, their practical impact on pre-training remains unexplored. This study focuses on empirically assessing the influence of global attention on BERT pre-training. The primary steps involve creating an extensive corpus of structure-aware text through arXiv data, alongside a text-only counterpart. We carry out pre-training on these two datasets, investigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Adam · Residual Connection · Weight Decay · Softmax · Attention Is All You Need · Multi-Head Attention · Dense Connections · Dropout
