PaQ-DETR: Learning Pattern and Quality-Aware Dynamic Queries for Object Detection
Zhengjian Kang, Jun Zhuang, Kangtong Mo, Qi Chen, Rui Liu, Ye Zhang

TL;DR
PaQ-DETR introduces a unified framework that improves object detection by learning shared patterns and dynamically generating queries, resulting in better adaptability, supervision, and interpretability across multiple benchmarks.
Contribution
It proposes a novel pattern and quality-aware approach that enhances query adaptivity and supervision balance in DETR models, leading to consistent accuracy gains.
Findings
Achieves 1.5%-4.2% mAP improvement on COCO and CityScapes.
Learns shared latent patterns capturing global semantics.
Provides interpretable insights into semantic clustering of patterns.
Abstract
Detection Transformer (DETR) has redefined object detection by casting it as a set prediction task within an end-to-end framework. Despite its elegance, DETR and its variants still rely on fixed learnable queries and suffer from severe query utilization imbalance, which limits adaptability and leaves the model capacity underused. We propose PaQ-DETR (Pattern and Quality-Aware DETR), a unified framework that enhances both query adaptivity and supervision balance. It learns a compact set of shared latent patterns capturing global semantics and dynamically generates image-specific queries through content-conditioned weighting. In parallel, a quality-aware one-to-many assignment strategy adaptively selects positive samples based on localizatio-classification consistency, enriching supervision and promoting balanced query optimization. Experiments on COCO, CityScapes, and other benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
