Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

Fuyuan Liu; Dianyu Yu; He Ren; Nayu Liu; Xiaomian Kang; Delai Qiu; Fa Zhang; Genpeng Zhen; Shengping Liu; Jiaen Liang; Wei Huang; Yining Wang; Junnan Zhu

arXiv:2604.02692·cs.CV·April 6, 2026

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

Fuyuan Liu, Dianyu Yu, He Ren, Nayu Liu, Xiaomian Kang, Delai Qiu, Fa Zhang, Genpeng Zhen, Shengping Liu, Jiaen Liang, Wei Huang, Yining Wang, Junnan Zhu

PDF

TL;DR

This paper introduces a structural refinement module that stabilizes parser interfaces in document layout analysis, improving layout quality and reducing sequence mismatch in dense, complex pages.

Contribution

It proposes a lightweight, set-level reasoning module that refines detector outputs and predicts parser input order, enhancing robustness in document parsing pipelines.

Findings

01

Improves page-level layout quality on public benchmarks.

02

Reduces sequence mismatch in end-to-end parsing pipelines.

03

Achieves a Reading Order Edit of 0.024 on OmniDocBench.

Abstract

Accurate document parsing requires both robust content recognition and a stable parser interface. In explicit Document Layout Analysis (DLA) pipelines, downstream parsers do not consume the full detector output. Instead, they operate on a retained and serialized set of layout instances. However, on dense pages with overlapping regions and ambiguous boundaries, unstable layout hypotheses can make the retained instance set inconsistent with its parser input order, leading to severe downstream parsing errors. To address this issue, we introduce a lightweight structural refinement stage between a DETR-style detector and the parser to stabilize the parser interface. Treating raw detector outputs as a compact hypothesis pool, the proposed module performs set-level reasoning over query features, semantic cues, box geometry, and visual evidence. From a shared refined structural state, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.