DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures

Benno Uthayasooriyar; Antoine Ly; Franck Vermet; Caio Corro

arXiv:2507.08606·cs.CL·January 23, 2026

DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures

Benno Uthayasooriyar, Antoine Ly, Franck Vermet, Caio Corro

PDF

Open Access 1 Models 1 Video

TL;DR

DocPolarBERT is a novel layout-aware BERT model that uses relative polar coordinate encoding for document understanding, achieving state-of-the-art results with less pre-training data by enhancing self-attention mechanisms.

Contribution

Introduces a layout-aware BERT model with relative polar coordinate encoding that outperforms existing models despite smaller pre-training datasets.

Findings

01

Achieves state-of-the-art performance on document understanding tasks.

02

Reduces reliance on large-scale pre-training data.

03

Demonstrates effectiveness of polar coordinate-based attention mechanisms.

Abstract

We introduce DocPolarBERT, a layout-aware BERT model for document understanding that eliminates the need for absolute 2D positional embeddings. We extend self-attention to take into account text block positions in relative polar coordinate system rather than the Cartesian one. Despite being pre-trained on a dataset more than six times smaller than the widely used IIT-CDIP corpus, DocPolarBERT achieves state-of-the-art results. These results demonstrate that a carefully designed attention mechanism can compensate for reduced pre-training data, offering an efficient and effective alternative for document understanding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
buthaya/docpolarbert-base
model· 150 dl· ♡ 1
150 dl♡ 1

Videos

DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Image Processing and 3D Reconstruction