# Unsupervised Feature Learning for Writer Identification and Writer   Retrieval

**Authors:** Vincent Christlein, Martin Gropp, Stefan Fiel, Andreas Maier

arXiv: 1705.09369 · 2024-02-28

## TL;DR

This paper introduces an unsupervised method for learning CNN features for writer identification and retrieval, using clustering to create surrogate classes, and demonstrates its effectiveness on historical document datasets.

## Contribution

It presents a novel unsupervised approach to train CNNs for writer identification, bypassing the need for labeled data and outperforming some supervised methods.

## Key findings

- Unsupervised CNN features outperform state-of-the-art in writer identification.
- Comparable results achieved in handwriting classification.
- Method effective on historical document datasets.

## Abstract

Deep Convolutional Neural Networks (CNN) have shown great success in supervised classification tasks such as character classification or dating. Deep learning methods typically need a lot of annotated training data, which is not available in many scenarios. In these cases, traditional methods are often better than or equivalent to deep learning methods. In this paper, we propose a simple, yet effective, way to learn CNN activation features in an unsupervised manner. Therefore, we train a deep residual network using surrogate classes. The surrogate classes are created by clustering the training dataset, where each cluster index represents one surrogate class. The activations from the penultimate CNN layer serve as features for subsequent classification tasks. We evaluate the feature representations on two publicly available datasets. The focus lies on the ICDAR17 competition dataset on historical document writer identification (Historical-WI). We show that the activation features trained without supervision are superior to descriptors of state-of-the-art writer identification methods. Additionally, we achieve comparable results in the case of handwriting classification using the ICFHR16 competition dataset on historical Latin script types (CLaMM16).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.09369/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1705.09369/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1705.09369/full.md

---
Source: https://tomesphere.com/paper/1705.09369