SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for   Writer Retrieval

Marco Peer; Florian Kleber; Robert Sablatnig

arXiv:2404.17221·cs.CV·April 29, 2024

SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval

Marco Peer, Florian Kleber, Robert Sablatnig

PDF

Open Access 1 Repo

TL;DR

SAGHOG introduces a self-supervised transformer-based approach for writer retrieval using HOG features, achieving state-of-the-art results on historical handwriting datasets.

Contribution

The paper presents a novel self-supervised pretraining method for writer retrieval that combines HOG features with vision transformers and NetRVLAD encoding.

Findings

01

Outperforms existing methods on HisFrag20 with 57.2% mAP

02

Achieves 58.0% Top-1 accuracy on GRK-Papyri

03

Demonstrates robustness on challenging historical datasets

Abstract

This paper introduces SAGHOG, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. SAGHOG is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of SAGHOG for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, SAGHOG outperforms related work with a mAP of 57.2 % - a margin of 11.6 % to the current state of the art, showcasing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marco-peer/icdar24
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Residual Connection · Softmax · Vision Transformer