Understanding differences in applying DETR to natural and medical images
Yanqi Xu, Yiqiu Shen, Carlos Fernandez-Granda, Laura Heacock, Krzysztof J. Geras

TL;DR
This paper investigates how transformer-based object detectors, successful in natural images, perform on medical images like mammograms, revealing that simpler models often outperform complex natural-image designs in medical contexts.
Contribution
It demonstrates that standard natural image detection strategies may not be optimal for medical imaging, advocating for simplified architectures tailored to medical data characteristics.
Findings
Complex encoder architectures do not improve medical image detection.
Simpler, shallower models often perform better on medical data.
Standard multi-scale and iterative refinement strategies may impair performance in medical imaging.
Abstract
Transformer-based detectors have shown success in computer vision tasks with natural images. These models, exemplified by the Deformable DETR, are optimized through complex engineering strategies tailored to the typical characteristics of natural scenes. However, medical imaging data presents unique challenges such as extremely large image sizes, fewer and smaller regions of interest, and object classes which can be differentiated only through subtle differences. This study evaluates the applicability of these transformer-based design choices when applied to a screening mammography dataset that represents these distinct medical imaging data characteristics. Our analysis reveals that common design choices from the natural image domain, such as complex encoder architectures, multi-scale feature fusion, query initialization, and iterative bounding box refinement, do not improve and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Radiomics and Machine Learning in Medical Imaging
MethodsDeformable Attention Module · Deformable DETR
