Deep neural networks with Fisher vector encoding for medical image classification
Lucas O. Lyra, Antonio E. Fabris, Joao B. Florindo

TL;DR
This paper introduces a Fisher Vector encoding method integrated with hybrid CNN + ViT models for medical image classification, improving performance across datasets of varying sizes.
Contribution
It proposes a novel approach combining Fisher Vectors with hybrid CNN + ViT architectures, addressing dataset size scalability and computational efficiency.
Findings
Outperforms benchmarks on all MedMNIST (v2) datasets.
Achieves competitive results on Clean-CC-CCII and ISIC2018.
Proposes a cost-limiting GMM estimation method for large datasets.
Abstract
Orderless encoding methods have shown to improve Convolutional Neural Networks (CNNs) for image classification in the context of limited availability of data. Additionally, hybrid CNN + Vision Transformers (ViT) models have been recently proposed to address CNN locality bias issues. These models outperformed CNN-only approaches. Despite that, the integration of such hybrid models with more elaborated feature representation can be highly beneficial and remains large unexplored in the literature. In this context, we propose the introduction of an orderless encoding method, Fisher Vectors, to hybrid CNN + ViT architectures, aiming at achieving a model suitable for both small and large datasets. Such enconding method relies on estimating a Gaussian Mixture Model (GMM) on image features. In large datasets, computational costs of the GMM estimation is a limiting factor for the application of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
