LLM-Guided Probabilistic Fusion for Label-Efficient Document Layout Analysis
Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

TL;DR
This paper introduces a probabilistic fusion framework that leverages large language model priors to improve semi-supervised document layout analysis, achieving high accuracy with minimal labeled data.
Contribution
It presents a novel fusion method combining visual predictions with LLM-derived structural priors, enhancing semi-supervised detection across different model scales.
Findings
Achieves 88.2 AP with only 5% labels on PubLayNet using a lightweight backbone.
Surpasses standard semi-supervised methods and matches state-of-the-art with pretrained models.
Demonstrates the effectiveness of LLM priors in semantic disambiguation and privacy-preserving deployment.
Abstract
Document layout understanding remains data-intensive despite advances in semi-supervised learning. We present a framework that enhances semi-supervised detection by fusing visual predictions with structural priors from text-pretrained LLMs via principled probabilistic weighting. Given unlabeled documents, an OCR-LLM pipeline infers hierarchical regions which are combined with teacher detector outputs through inverse-variance fusion to generate refined pseudo-labels.Our method demonstrates consistent gains across model scales. With a lightweight SwiftFormer backbone (26M params), we achieve 88.20.3 AP using only 5\% labels on PubLayNet. When applied to document-pretrained LayoutLMv3 (133M params), our fusion framework reaches 89.70.4 AP, surpassing both LayoutLMv3 with standard semi-supervised learning (89.10.4 AP, p=0.02) and matching UDOP~\cite{udop} (89.8 AP) which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
