Encoding CNN Activations for Writer Recognition
Vincent Christlein, Andreas Maier

TL;DR
This paper explores advanced encoding techniques for CNN activations in writer recognition, comparing VLAD and triangulation embedding, and introduces pooling and decorrelation methods to improve identification accuracy on public datasets.
Contribution
It introduces and evaluates novel encoding and pooling strategies for CNN features, setting new benchmarks in writer recognition accuracy.
Findings
Triangulation embedding outperforms VLAD in writer recognition.
Generalized max pooling improves feature aggregation.
Decorrelated features enhance identification performance.
Abstract
The encoding of local features is an essential part for writer identification and writer retrieval. While CNN activations have already been used as local features in related works, the encoding of these features has attracted little attention so far. In this work, we compare the established VLAD encoding with triangulation embedding. We further investigate generalized max pooling as an alternative to sum pooling and the impact of decorrelation and Exemplar SVMs. With these techniques, we set new standards on two publicly available datasets (ICDAR13, KHATT).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMax Pooling
