Q-Former Autoencoder: A Modern Framework for Medical Anomaly Detection
Francesco Dalmonte, Emirhan Bayar, Emre Akbas, Mariana-Iuliana Georgescu

TL;DR
This paper introduces the Q-Former Autoencoder, a novel unsupervised framework for medical anomaly detection that leverages pretrained vision foundation models and a specialized architecture to improve detection accuracy without domain-specific training.
Contribution
The paper presents a modern autoencoder framework utilizing frozen vision foundation models and a Q-Former bottleneck architecture for effective medical anomaly detection.
Findings
Achieved state-of-the-art results on multiple medical benchmarks
Effectively leverages pretrained vision models without fine-tuning
Demonstrates strong generalization across diverse medical datasets
Abstract
Anomaly detection in medical images is an important yet challenging task due to the diversity of possible anomalies and the practical impossibility of collecting comprehensively annotated data sets. In this work, we tackle unsupervised medical anomaly detection proposing a modernized autoencoder-based framework, the Q-Former Autoencoder, that leverages state-of-the-art pretrained vision foundation models, such as DINO, DINOv2 and Masked Autoencoder. Instead of training encoders from scratch, we directly utilize frozen vision foundation models as feature extractors, enabling rich, multi-stage, high-level representations without domain-specific fine-tuning. We propose the usage of the Q-Former architecture as the bottleneck, which enables the control of the length of the reconstruction sequence, while efficiently aggregating multiscale features. Additionally, we incorporate a perceptual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
