PDNet: Toward Better One-Stage Object Detection With Prediction Decoupling
Li Yang, Yan Xu, Shaoru Wang, Chunfeng Yuan, Ziqi Zhang, Bing Li,, Weiming Hu

TL;DR
PDNet introduces a prediction decoupling mechanism for one-stage object detection, improving accuracy by separately encoding targets and adaptively selecting inference positions for classification and localization.
Contribution
The paper proposes a novel prediction-target-decoupled framework with dynamic points for better target inference in one-stage detectors, outperforming existing methods on MS COCO.
Findings
Achieves 50.1 AP on MS COCO with ResNeXt-64x4d-101-DCN backbone.
Outperforms state-of-the-art methods under same experimental settings.
Demonstrates high efficiency as a one-stage detector.
Abstract
Recent one-stage object detectors follow a per-pixel prediction approach that predicts both the object category scores and boundary positions from every single grid location. However, the most suitable positions for inferring different targets, i.e., the object category and boundaries, are generally different. Predicting all these targets from the same grid location thus may lead to sub-optimal results. In this paper, we analyze the suitable inference positions for object category and boundaries, and propose a prediction-target-decoupled detector named PDNet to establish a more flexible detection paradigm. Our PDNet with the prediction decoupling mechanism encodes different targets separately in different locations. A learnable prediction collection module is devised with two sets of dynamic points, i.e., dynamic boundary points and semantic points, to collect and aggregate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
