Encoding CNN Activations for Writer Recognition

Vincent Christlein; Andreas Maier

arXiv:1712.07923·cs.CV·January 16, 2018

Encoding CNN Activations for Writer Recognition

Vincent Christlein, Andreas Maier

PDF

TL;DR

This paper explores advanced encoding techniques for CNN activations in writer recognition, comparing VLAD and triangulation embedding, and introduces pooling and decorrelation methods to improve identification accuracy on public datasets.

Contribution

It introduces and evaluates novel encoding and pooling strategies for CNN features, setting new benchmarks in writer recognition accuracy.

Findings

01

Triangulation embedding outperforms VLAD in writer recognition.

02

Generalized max pooling improves feature aggregation.

03

Decorrelated features enhance identification performance.

Abstract

The encoding of local features is an essential part for writer identification and writer retrieval. While CNN activations have already been used as local features in related works, the encoding of these features has attracted little attention so far. In this work, we compare the established VLAD encoding with triangulation embedding. We further investigate generalized max pooling as an alternative to sum pooling and the impact of decorrelation and Exemplar SVMs. With these techniques, we set new standards on two publicly available datasets (ICDAR13, KHATT).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMax Pooling