SelfDocSeg: A Self-Supervised vision-based Approach towards Document   Segmentation

Subhajit Maity; Sanket Biswas; Siladittya Manna; Ayan Banerjee; Josep; Llad\'os; Saumik Bhattacharya; Umapada Pal

arXiv:2305.00795·cs.CV·August 22, 2023·1 cites

SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep, Llad\'os, Saumik Bhattacharya, Umapada Pal

PDF

Open Access 1 Repo

TL;DR

SelfDocSeg introduces a vision-based self-supervised method for document segmentation that pre-trains on pseudo-layouts without labeled data, achieving competitive or superior results compared to supervised approaches.

Contribution

The paper presents a novel self-supervised, vision-only pre-training approach for document segmentation using pseudo-layouts, eliminating the need for labeled data.

Findings

01

Sets a new benchmark in document segmentation.

02

Performs on par or better than supervised methods.

03

Uses pseudo-layouts for self-supervised pre-training.

Abstract

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maitysubhajit/selfdocseg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques