Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

Young-Jin Park; Carson Sobolewski; Navid Azizan

arXiv:2412.01782·cs.CV·April 22, 2026

Uncertainty Quantification in Detection Transformers: Object-Level Calibration and Image-Level Reliability

Young-Jin Park, Carson Sobolewski, Navid Azizan

PDF

TL;DR

This paper investigates the reliability of DETR object detection predictions, revealing their specialized calibration strategy, and introduces Object-level Calibration Error (OCE) for better uncertainty quantification and model evaluation.

Contribution

It provides empirical and theoretical insights into DETRs' prediction calibration strategy and proposes OCE for improved reliability assessment and uncertainty quantification.

Findings

01

DETR predictions follow an optimal specialist calibration strategy.

02

Existing metrics like AP and ECE are inadequate for reliability assessment.

03

OCE effectively evaluates model calibration and identifies reliable predictions.

Abstract

DETR and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly important for safety-critical applications, such as in autonomous vehicles. Addressing this concern, we provide empirical and theoretical evidence that predictions within the same image play distinct roles, resulting in varying reliability levels. Our analysis reveals that DETRs employ an optimal specialist strategy: one prediction per object is trained to be well-calibrated, while the remaining predictions are trained to suppress their foreground confidence to near zero, even when maintaining accurate localization. We show that this strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.