Rethinking the Misalignment Problem in Dense Object Detection
Yang Yang, Min Li, Bo Meng, Junxing Ren, Degang Sun, Zihao Huang

TL;DR
This paper introduces SALT, a novel operator for dense object detection that spatially disentangles and aligns features for classification and localization tasks, improving detection accuracy.
Contribution
The paper proposes SALT, a plug-in operator that spatially disentangles task-specific features and a self-distillation loss, leading to significant improvements in dense object detection performance.
Findings
SALT improves AP by approximately 2 points across various detectors.
SALT-Net achieves 53.8 AP on MS-COCO test-dev with Res2Net-101-DCN backbone.
The methods enhance the alignment between classification and localization features.
Abstract
Object detection aims to localize and classify the objects in a given image, and these two tasks are sensitive to different object regions. Therefore, some locations predict high-quality bounding boxes but low classification scores, and some locations are quite the opposite. A misalignment exists between the two tasks, and their features are spatially entangled. In order to solve the misalignment problem, we propose a plug-in Spatial-disentangled and Task-aligned operator (SALT). By predicting two task-aware point sets that are located in each task's sensitive regions, SALT can reassign features from those regions and align them to the corresponding anchor point. Therefore, features for the two tasks are spatially aligned and disentangled. To minimize the difference between the two regression stages, we propose a Self-distillation regression (SDR) loss that can transfer knowledge from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
