Graphical Object Detection in Document Images
Ranajit Saha, Ajoy Mondal, C. V. Jawahar

TL;DR
This paper introduces GOD, a deep learning framework for localizing graphical objects like tables and figures in document images, leveraging transfer learning and domain adaptation to perform well on benchmark datasets.
Contribution
The paper presents a novel end-to-end trainable deep learning model for graphical object detection in documents that does not rely on heuristics or meta-data.
Findings
Achieves promising results on ICDAR-2013, ICDAR-POD2017, and UNLV datasets.
Outperforms existing state-of-the-art methods.
Effectively handles scarcity of labeled training data.
Abstract
Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR-2013, ICDAR-POD2017,and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
