Beyond Hungarian: Match-Free Supervision for End-to-End Object Detection
Shoumeng Qiu, Xinrun Li, Yang Long

TL;DR
This paper introduces a matching-free training scheme for DETR-based object detectors that uses a cross-attention mechanism to learn implicit query-target correspondences, improving training efficiency and performance.
Contribution
It proposes a novel differentiable matching-free approach with a Cross-Attention-based Query Selection module, eliminating the need for Hungarian matching in end-to-end object detection.
Findings
Reduces matching latency by over 50%
Enhances training efficiency and stability
Achieves superior detection performance
Abstract
Recent DEtection TRansformer (DETR) based frameworks have achieved remarkable success in end-to-end object detection. However, the reliance on the Hungarian algorithm for bipartite matching between queries and ground truths introduces computational overhead and complicates the training dynamics. In this paper, we propose a novel matching-free training scheme for DETR-based detectors that eliminates the need for explicit heuristic matching. At the core of our approach is a dedicated Cross-Attention-based Query Selection (CAQS) module. Instead of discrete assignment, we utilize encoded ground-truth information to probe the decoder queries through a cross-attention mechanism. By minimizing the weighted error between the queried results and the ground truths, the model autonomously learns the implicit correspondences between object queries and specific targets. This learned relationship…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
