Taming Detection Transformers for Medical Object Detection

Marc K. Ickler; Michael Baumgartner; Saikat Roy; Tassilo Wald; Klaus; H. Maier-Hein

arXiv:2306.15472·cs.CV·June 28, 2023

Taming Detection Transformers for Medical Object Detection

Marc K. Ickler, Michael Baumgartner, Saikat Roy, Tassilo Wald, Klaus, H. Maier-Hein

PDF

TL;DR

This paper explores the application of DETR models for volumetric medical object detection, demonstrating they can match or surpass traditional methods without complex heuristics.

Contribution

It is the first comprehensive evaluation of DETR-based models for 3D medical image detection, showing their competitive performance against established approaches.

Findings

01

DINO DETR outperforms Retina U-Net on three datasets.

02

DETR models achieve comparable or better accuracy than existing methods.

03

Set prediction models simplify the detection pipeline by removing heuristics.

Abstract

The accurate detection of suspicious regions in medical images is an error-prone and time-consuming process required by many routinely performed diagnostic procedures. To support clinicians during this difficult task, several automated solutions were proposed relying on complex methods with many hyperparameters. In this study, we investigate the feasibility of DEtection TRansformer (DETR) models for volumetric medical object detection. In contrast to previous works, these models directly predict a set of objects without relying on the design of anchors or manual heuristics such as non-maximum-suppression to detect objects. We show by conducting extensive experiments with three models, namely DETR, Conditional DETR, and DINO DETR on four data sets (CADA, RibFrac, KiTS19, and LIDC) that these set prediction models can perform on par with or even better than currently existing methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Vision Transformer · Absolute Position Encodings · Linear Layer · Convolution · Layer Normalization · Label Smoothing · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia?