PARL: Position-Aware Relation Learning Network for Document Layout Analysis

Fuyuan Liu; Dianyu Yu; He Ren; Nayu Liu; Xiaomian Kang; Delai Qiu; Fa Zhang; Genpeng Zhen; Shengping Liu; Jiaen Liang; Wei Huang; Yining Wang; Junnan Zhu

arXiv:2601.07620·cs.CV·January 13, 2026

PARL: Position-Aware Relation Learning Network for Document Layout Analysis

Fuyuan Liu, Dianyu Yu, He Ren, Nayu Liu, Xiaomian Kang, Delai Qiu, Fa Zhang, Genpeng Zhen, Shengping Liu, Jiaen Liang, Wei Huang, Yining Wang, Junnan Zhu

PDF

Open Access

TL;DR

PARL is an OCR-free, vision-only document layout analysis network that models intrinsic visual structure through positional and relational features, achieving state-of-the-art results with high efficiency.

Contribution

It introduces a novel vision-only framework that leverages positional and relational modeling, surpassing multimodal methods in accuracy and efficiency.

Findings

01

Achieves state-of-the-art performance on DocLayNet.

02

Outperforms multimodal models on M6Doc.

03

Uses significantly fewer parameters than large multimodal models.

Abstract

Document layout analysis aims to detect and categorize structural elements (e.g., titles, tables, figures) in scanned or digital documents. Popular methods often rely on high-quality Optical Character Recognition (OCR) to merge visual features with extracted text. This dependency introduces two major drawbacks: propagation of text recognition errors and substantial computational overhead, limiting the robustness and practical applicability of multimodal approaches. In contrast to the prevailing multimodal trend, we argue that effective layout analysis depends not on text-visual fusion, but on a deep understanding of documents' intrinsic visual structure. To this end, we propose PARL (Position-Aware Relation Learning Network), a novel OCR-free, vision-only framework that models layout through positional sensitivity and relational structure. Specifically, we first introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques