Position Masking for Improved Layout-Aware Document Understanding
Anik Saha, Catherine Finegan-Dollak, Ashish Verma

TL;DR
This paper introduces position masking as a new pre-training task for layout-aware document understanding models, significantly enhancing their performance on form understanding tasks by over 5%.
Contribution
It proposes position masking to improve layout-aware embeddings, demonstrating notable performance gains over language masking alone.
Findings
Position masking improves form understanding accuracy by over 5%.
Models with position masking outperform those with only language masking.
Position masking enhances layout-aware document understanding models.
Abstract
Natural language processing for document scans and PDFs has the potential to enormously improve the efficiency of business processes. Layout-aware word embeddings such as LayoutLM have shown promise for classification of and information extraction from such documents. This paper proposes a new pre-training task called that can improve performance of layout-aware word embeddings that incorporate 2-D position embeddings. We compare models pre-trained with only language masking against models pre-trained with both language masking and position masking, and we find that position masking improves performance by over 5% on a form understanding task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Music and Audio Processing
