Graphical Object Detection in Document Images

Ranajit Saha; Ajoy Mondal; C. V. Jawahar

arXiv:2008.10843·cs.CV·August 26, 2020

Graphical Object Detection in Document Images

Ranajit Saha, Ajoy Mondal, C. V. Jawahar

PDF

Open Access 1 Repo

TL;DR

This paper introduces GOD, a deep learning framework for localizing graphical objects like tables and figures in document images, leveraging transfer learning and domain adaptation to perform well on benchmark datasets.

Contribution

The paper presents a novel end-to-end trainable deep learning model for graphical object detection in documents that does not rely on heuristics or meta-data.

Findings

01

Achieves promising results on ICDAR-2013, ICDAR-POD2017, and UNLV datasets.

02

Outperforms existing state-of-the-art methods.

03

Effectively handles scarcity of labeled training data.

Abstract

Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR-2013, ICDAR-POD2017,and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rnjtsh/graphical-object-detector
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications