Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained   Form Classification

Taylor Archibald; Tony Martinez

arXiv:2405.14162·cs.CV·May 27, 2024

Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification

Taylor Archibald, Tony Martinez

PDF

Open Access

TL;DR

This paper introduces a novel approach combining semantic segmentation with deep learning embeddings to improve fine-grained, unsupervised classification of historical document forms, demonstrating significant accuracy improvements.

Contribution

It is the first to evaluate embeddings on fine-grained, unsupervised form classification and proposes using semantic segmentation as a preprocessing step to enhance embedding quality.

Findings

01

Semantic segmentation improves clustering accuracy.

02

Embeddings effectively distinguish similar document types.

03

Proposed method outperforms baseline approaches.

Abstract

Efficient categorization of historical documents is crucial for fields such as genealogy, legal research, and historical scholarship, where manual classification is impractical for large collections due to its labor-intensive and error-prone nature. To address this, we propose a representational learning strategy that integrates semantic segmentation and deep learning models such as ResNet, CLIP, Document Image Transformer (DiT), and masked auto-encoders (MAE), to generate embeddings that capture document features without predefined labels. To the best of our knowledge, we are the first to evaluate embeddings on fine-grained, unsupervised form classification. To improve these embeddings, we propose to first employ semantic segmentation as a preprocessing step. We contribute two novel datasets $\unicode x 2014$ the French 19th-century and U.S. 1950 Census records $\unicode x 2014$ to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · Handwritten Text Recognition Techniques

MethodsAttention Is All You Need · Kaiming Initialization · Max Pooling · Average Pooling · Global Average Pooling · Linear Layer · Position-Wise Feed-Forward Layer · Convolution · Multi-Head Attention · Residual Connection