Multidomain Document Layout Understanding using Few Shot Object   Detection

Pranaydeep Singh; Srikrishna Varadarajan; Ankit Narayan Singh; Muktabh; Mayank Srivastava

arXiv:1808.07330·cs.CV·August 23, 2018

Multidomain Document Layout Understanding using Few Shot Object Detection

Pranaydeep Singh, Srikrishna Varadarajan, Ankit Narayan Singh, Muktabh, Mayank Srivastava

PDF

TL;DR

This paper presents a transfer learning-based method for document layout understanding that generalizes across multiple domains with minimal training data, using few-shot object detection techniques.

Contribution

It introduces a simple, effective methodology combining pre-training on artificial data and fine-tuning on small domain-specific datasets for layout understanding.

Findings

01

Works with as few as 10 documents per domain

02

Outperforms simple object detectors

03

Demonstrates cross-domain generalization

Abstract

We try to address the problem of document layout understanding using a simple algorithm which generalizes across multiple domains while training on just few examples per domain. We approach this problem via supervised object detection method and propose a methodology to overcome the requirement of large datasets. We use the concept of transfer learning by pre-training our object detector on a simple artificial (source) dataset and fine-tuning it on a tiny domain specific (target) dataset. We show that this methodology works for multiple domains with training samples as less as 10 documents. We demonstrate the effect of each component of the methodology in the end result and show the superiority of this methodology over simple object detectors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.