End-to-end Learning of a Fisher Vector Encoding for Part Features in Fine-grained Recognition
Dimitri Korsch, Paul Bodesheim, Joachim Denzler

TL;DR
This paper introduces an end-to-end trainable Fisher vector encoding for part features in fine-grained recognition, improving accuracy by capturing invariant local features and handling varying visible parts.
Contribution
It proposes a novel Fisher vector encoding integrated into CNNs, jointly trained with an online EM algorithm for better local feature representation.
Findings
Improved accuracy on bird species datasets
Effective handling of occlusions and viewpoint variations
Superior to previous part-based methods
Abstract
Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although explicitly focusing on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in not observable parts. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated by an online EM algorithm jointly with those of the neural network and are more precise than the estimates of previous works. Our approach improves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWildlife Ecology and Conservation · Species Distribution and Climate Change · Morphological variations and asymmetry
