HybriDLA: Hybrid Generation for Document Layout Analysis

Yufan Chen; Omar Moured; Ruiping Liu; Junwei Zheng; Kunyu Peng; Jiaming Zhang; Rainer Stiefelhagen

arXiv:2511.19919·cs.CV·November 26, 2025

HybriDLA: Hybrid Generation for Document Layout Analysis

Yufan Chen, Omar Moured, Ruiping Liu, Junwei Zheng, Kunyu Peng, Jiaming Zhang, Rainer Stiefelhagen

PDF

Open Access 1 Video

TL;DR

HybriDLA is a novel generative framework combining diffusion and autoregressive decoding to improve document layout analysis, especially for complex and diverse modern documents, achieving state-of-the-art performance.

Contribution

The paper introduces HybriDLA, a unified generative model that integrates diffusion and autoregressive decoding for enhanced document layout analysis.

Findings

01

Achieves 83.5% mAP on benchmark datasets.

02

Outperforms previous state-of-the-art methods.

03

Effectively handles complex and diverse document layouts.

Abstract

Conventional document layout analysis (DLA) traditionally depends on empirical priors or a fixed set of learnable queries executed in a single forward pass. While sufficient for early-generation documents with a small, predetermined number of regions, this paradigm struggles with contemporary documents, which exhibit diverse element counts and increasingly complex layouts. To address challenges posed by modern documents, we present HybriDLA, a novel generative framework that unifies diffusion and autoregressive decoding within a single layer. The diffusion component iteratively refines bounding-box hypotheses, whereas the autoregressive component injects semantic and contextual awareness, enabling precise region prediction even in highly varied layouts. To further enhance detection quality, we design a multi-scale feature-fusion encoder that captures both fine-grained and high-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HybriDLA: Hybrid Generation for Document Layout Analysis· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques