OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning

Hengrui Kang; Zhuangcheng Gu; Zhiyuan Zhao; Zichen Wen; Bin Wang; Weijia Li; Conghui He

arXiv:2510.26213·cs.CV·November 25, 2025

OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning

Hengrui Kang, Zhuangcheng Gu, Zhiyuan Zhao, Zichen Wen, Bin Wang, Weijia Li, Conghui He

PDF

TL;DR

This paper introduces OmniDocLayout, a large-scale dataset and a novel two-stage LLM approach for diverse and complex document layout generation, significantly advancing the field beyond traditional Manhattan-style structures.

Contribution

It presents the first million-scale dataset of diverse document layouts and a two-stage Coarse-to-Fine LLM training paradigm for improved layout generation across multiple domains.

Findings

01

Outperforms existing layout generation methods and general-purpose LLMs

02

Achieves strong results on multiple domain datasets

03

Substantially improves diversity and complexity in generated layouts

Abstract

Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, layout generation, remains underexplored. Distinct from traditional graphic layout design and room layout planning, document layout generation typically involves a larger number of elements per page and exhibits greater structural diversity and complexity. Currently, a major obstacle lies in the scarcity of diverse document layouts: academic papers with Manhattan-style structures dominate existing studies, while open-world genres such as newspapers and magazines remain severely underrepresented. To address this gap, we curate OmniDocLayout-1M, the first million-scale dataset of diverse document layouts, covering six common document types and comprising contemporary layouts collected from multiple sources. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.