A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal

TL;DR
This paper introduces a hybrid Transformer-based approach for document layout analysis that improves detection accuracy of various elements in document images, outperforming existing methods on multiple benchmarks.
Contribution
It proposes a novel graphical page object detector with a query encoding mechanism and hybrid matching scheme, enhancing detection accuracy and efficiency in document layout analysis.
Findings
Achieves 97.3% AP on PubLayNet
Attains 81.6% AP on DocLayNet
Reaches 98.6% AP on PubTables
Abstract
Document layout analysis involves understanding the arrangement of elements within a document. This paper navigates the complexities of understanding various elements within document images, such as text, images, tables, and headings. The approach employs an advanced Transformer-based object detection network as an innovative graphical page object detector for identifying tables, figures, and displayed elements. We introduce a query encoding mechanism to provide high-quality object queries for contrastive learning, enhancing efficiency in the decoder phase. We also present a hybrid matching scheme that integrates the decoder's original one-to-one matching strategy with the one-to-many matching strategy during the training phase. This approach aims to improve the model's accuracy and versatility in detecting various graphical elements on a page. Our experiments on PubLayNet, DocLayNet,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Digital Media Forensic Detection
