Position Masking for Improved Layout-Aware Document Understanding

Anik Saha; Catherine Finegan-Dollak; Ashish Verma

arXiv:2109.00442·cs.CL·September 2, 2021

Position Masking for Improved Layout-Aware Document Understanding

Anik Saha, Catherine Finegan-Dollak, Ashish Verma

PDF

Open Access

TL;DR

This paper introduces position masking as a new pre-training task for layout-aware document understanding models, significantly enhancing their performance on form understanding tasks by over 5%.

Contribution

It proposes position masking to improve layout-aware embeddings, demonstrating notable performance gains over language masking alone.

Findings

01

Position masking improves form understanding accuracy by over 5%.

02

Models with position masking outperform those with only language masking.

03

Position masking enhances layout-aware document understanding models.

Abstract

Natural language processing for document scans and PDFs has the potential to enormously improve the efficiency of business processes. Layout-aware word embeddings such as LayoutLM have shown promise for classification of and information extraction from such documents. This paper proposes a new pre-training task called that can improve performance of layout-aware word embeddings that incorporate 2-D position embeddings. We compare models pre-trained with only language masking against models pre-trained with both language masking and position masking, and we find that position masking improves performance by over 5% on a form understanding task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Topic Modeling · Music and Audio Processing