A Hybrid Approach for Document Layout Analysis in Document images

Tahira Shehzadi; Didier Stricker; Muhammad Zeshan Afzal

arXiv:2404.17888·cs.CV·May 2, 2024·1 cites

A Hybrid Approach for Document Layout Analysis in Document images

Tahira Shehzadi, Didier Stricker, Muhammad Zeshan Afzal

PDF

Open Access

TL;DR

This paper introduces a hybrid Transformer-based approach for document layout analysis that improves detection accuracy of various elements in document images, outperforming existing methods on multiple benchmarks.

Contribution

It proposes a novel graphical page object detector with a query encoding mechanism and hybrid matching scheme, enhancing detection accuracy and efficiency in document layout analysis.

Findings

01

Achieves 97.3% AP on PubLayNet

02

Attains 81.6% AP on DocLayNet

03

Reaches 98.6% AP on PubTables

Abstract

Document layout analysis involves understanding the arrangement of elements within a document. This paper navigates the complexities of understanding various elements within document images, such as text, images, tables, and headings. The approach employs an advanced Transformer-based object detection network as an innovative graphical page object detector for identifying tables, figures, and displayed elements. We introduce a query encoding mechanism to provide high-quality object queries for contrastive learning, enhancing efficiency in the decoder phase. We also present a hybrid matching scheme that integrates the decoder's original one-to-one matching strategy with the one-to-many matching strategy during the training phase. This approach aims to improve the model's accuracy and versatility in detecting various graphical elements on a page. Our experiments on PubLayNet, DocLayNet,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Digital Media Forensic Detection