DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro

TL;DR
DocPolarBERT is a novel layout-aware BERT model that uses relative polar coordinate encoding for document understanding, achieving state-of-the-art results with less pre-training data by enhancing self-attention mechanisms.
Contribution
Introduces a layout-aware BERT model with relative polar coordinate encoding that outperforms existing models despite smaller pre-training datasets.
Findings
Achieves state-of-the-art performance on document understanding tasks.
Reduces reliance on large-scale pre-training data.
Demonstrates effectiveness of polar coordinate-based attention mechanisms.
Abstract
We introduce DocPolarBERT, a layout-aware BERT model for document understanding that eliminates the need for absolute 2D positional embeddings. We extend self-attention to take into account text block positions in relative polar coordinate system rather than the Cartesian one. Despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus, DocPolarBERT achieves state-of-the-art results. These results demonstrate that a carefully designed attention mechanism can compensate for reduced pre-training data, offering an efficient and effective alternative for document understanding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Image Processing and 3D Reconstruction
