Approximate Fisher Kernels of non-iid Image Models for Image Categorization
Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid

TL;DR
This paper introduces non-iid image models using Fisher kernels with variational approximations, improving image categorization performance by capturing dependencies among local descriptors, and explaining the effectiveness of power normalization.
Contribution
It presents a novel non-iid modeling approach for Fisher kernels in image representation, utilizing variational bounds for tractable computation and demonstrating performance gains.
Findings
Performance comparable to power normalization methods
Models capture dependencies among local descriptors
Improves image categorization accuracy
Abstract
The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
