SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval
Marco Peer, Florian Kleber, Robert Sablatnig

TL;DR
SAGHOG introduces a self-supervised transformer-based approach for writer retrieval using HOG features, achieving state-of-the-art results on historical handwriting datasets.
Contribution
The paper presents a novel self-supervised pretraining method for writer retrieval that combines HOG features with vision transformers and NetRVLAD encoding.
Findings
Outperforms existing methods on HisFrag20 with 57.2% mAP
Achieves 58.0% Top-1 accuracy on GRK-Papyri
Demonstrates robustness on challenging historical datasets
Abstract
This paper introduces SAGHOG, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. SAGHOG is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of SAGHOG for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, SAGHOG outperforms related work with a mAP of 57.2 % - a margin of 11.6 % to the current state of the art, showcasing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Residual Connection · Softmax · Vision Transformer
