LP-DETR: Layer-wise Progressive Relations for Object Detection

Zhengjian Kang; Ye Zhang; Xiaoyu Deng; Xintao Li; Yongzhe Zhang

arXiv:2502.05147·cs.CV·May 14, 2025

LP-DETR: Layer-wise Progressive Relations for Object Detection

Zhengjian Kang, Ye Zhang, Xiaoyu Deng, Xintao Li, Yongzhe Zhang

PDF

Open Access

TL;DR

LP-DETR introduces a layer-wise progressive relation modeling approach that enhances object detection by adaptively learning multi-scale spatial relations, resulting in faster convergence and higher accuracy on COCO dataset.

Contribution

The paper proposes a novel layer-wise progressive relation mechanism for DETR, improving multi-scale relation modeling and detection performance.

Findings

01

Achieves 52.3% AP with 12 epochs on COCO

02

Learns to prioritize local relations early and global relations later

03

Improves convergence speed and detection accuracy

Abstract

This paper presents LP-DETR (Layer-wise Progressive DETR), a novel approach that enhances DETR-based object detection through multi-scale relation modeling. Our method introduces learnable spatial relationships between object queries through a relation-aware self-attention mechanism, which adaptively learns to balance different scales of relations (local, medium and global) across decoder layers. This progressive design enables the model to effectively capture evolving spatial dependencies throughout the detection pipeline. Extensive experiments on COCO 2017 dataset demonstrate that our method improves both convergence speed and detection accuracy compared to standard self-attention module. The proposed method achieves competitive results, reaching 52.3\% AP with 12 epochs and 52.5\% AP with 24 epochs using ResNet-50 backbone, and further improving to 58.0\% AP with Swin-L backbone.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Brain Tumor Detection and Classification

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings