Fisher Kernel for Deep Neural Activations
Donggeun Yoo, Sunggyun Park, Joon-Young Lee, In So Kweon

TL;DR
This paper introduces a method combining multi-scale CNN activations with Fisher kernel aggregation, significantly improving image recognition performance by leveraging both local detail and mid-level features.
Contribution
It proposes an efficient multi-scale dense activation extraction and a modified Fisher kernel for CNN features, enhancing image representation for recognition tasks.
Findings
+17.76% accuracy on MIT Indoor 67
+7.18% mAP on PASCAL VOC 2007
Significant performance improvements in visual recognition
Abstract
Compared to image representation based on low-level local descriptors, deep neural activations of Convolutional Neural Networks (CNNs) are richer in mid-level representation, but poorer in geometric invariance properties. In this paper, we present a straightforward framework for better image representation by combining the two approaches. To take advantages of both representations, we propose an efficient method to extract a fair amount of multi-scale dense local activations from a pre-trained CNN. We then aggregate the activations by Fisher kernel framework, which has been modified with a simple scale-wise normalization essential to make it suitable for CNN activations. Replacing the direct use of a single activation vector with our representation demonstrates significant performance improvements: +17.76 (Acc.) on MIT Indoor 67 and +7.18 (mAP) on PASCAL VOC 2007. The results suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
