Deep neural networks with Fisher vector encoding for medical image classification

Lucas O. Lyra; Antonio E. Fabris; Joao B. Florindo

arXiv:2605.01667·cs.CV·May 5, 2026

Deep neural networks with Fisher vector encoding for medical image classification

Lucas O. Lyra, Antonio E. Fabris, Joao B. Florindo

PDF

TL;DR

This paper introduces a Fisher Vector encoding method integrated with hybrid CNN + ViT models for medical image classification, improving performance across datasets of varying sizes.

Contribution

It proposes a novel approach combining Fisher Vectors with hybrid CNN + ViT architectures, addressing dataset size scalability and computational efficiency.

Findings

01

Outperforms benchmarks on all MedMNIST (v2) datasets.

02

Achieves competitive results on Clean-CC-CCII and ISIC2018.

03

Proposes a cost-limiting GMM estimation method for large datasets.

Abstract

Orderless encoding methods have shown to improve Convolutional Neural Networks (CNNs) for image classification in the context of limited availability of data. Additionally, hybrid CNN + Vision Transformers (ViT) models have been recently proposed to address CNN locality bias issues. These models outperformed CNN-only approaches. Despite that, the integration of such hybrid models with more elaborated feature representation can be highly beneficial and remains large unexplored in the literature. In this context, we propose the introduction of an orderless encoding method, Fisher Vectors, to hybrid CNN + ViT architectures, aiming at achieving a model suitable for both small and large datasets. Such enconding method relies on estimating a Gaussian Mixture Model (GMM) on image features. In large datasets, computational costs of the GMM estimation is a limiting factor for the application of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.