Extraction of Layout Entities and Sub-layout Query-based Retrieval of Document Images
Anukriti Bansal, Sumantra Dutta Roy, Gaurav Harit

TL;DR
This paper presents a graph-based, hash-indexed system for sub-layout retrieval in document images, supporting sketch-based queries and partial matching, with robustness to segmentation errors, demonstrated on newspaper images.
Contribution
It introduces a novel graph matching algorithm combined with hash indexing for efficient sub-layout retrieval, handling segmentation errors and partial matches.
Findings
Effective retrieval of sub-layouts in newspaper images.
Robustness to segmentation errors demonstrated.
Promising results on a dataset of 4776 images.
Abstract
Layouts and sub-layouts constitute an important clue while searching a document on the basis of its structure, or when textual content is unknown/irrelevant. A sub-layout specifies the arrangement of document entities within a smaller portion of the document. We propose an efficient graph-based matching algorithm, integrated with hash-based indexing, to prune a possibly large search space. A user can specify a combination of sub-layouts of interest using sketch-based queries. The system supports partial matching for unspecified layout entities. We handle cases of segmentation pre-processing errors (for text/non-text blocks) with a symmetry maximization-based strategy, and accounting for multiple domain-specific plausible segmentation hypotheses. We show promising results of our system on a database of unstructured entities, containing 4776 newspaper images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Analysis and Summarization
